Introducing a Hierarchical Attention Transformer for Document Embeddings

Total Page:16

File Type:pdf, Size:1020Kb

Introducing a Hierarchical Attention Transformer for Document Embeddings DEGREE PROJECT IN COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2019 Introducing a Hierarchical Attention Transformer for document embeddings Utilizing state-of-the-art word embeddings to generate numerical representations of text documents for classification VIKTOR KARLSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Introducing a Hierarchical Attention Transformer for document embeddings VIKTOR KARLSSON Master in Computer Science Date: December 8, 2019 Supervisor: Hamid Reza Faragardi Examiner: Olov Engwall School of Electrical Engineering and Computer Science Swedish title: Introduktion av Hierarchical Attention Transformer för dokumentrepresentationer iii Abstract The field of Natural Language Processing has produced a plethora of algo- rithms for creating numerical representations of words or subsets thereof. These representations encode the semantics of each unit which for word level tasks enable immediate utilization. Document level tasks on the other hand require special treatment in order for fixed length representations to be generated from varying length documents. We develop the Hierarchical Attention Transformer (HAT), a neural net- work model which utilizes the hierarchical nature of written text for creating document representations. The network rely entirely on attention which en- ables interpretability of its inferences and context to be attended from any- where within the sequence. We compare our proposed model to current state-of-the-art algorithms in three scenarios: Datasets of documents with an average length (1) less than three paragraphs, (2) greater than an entire page and (3) greater than an entire page with a limited amount of training documents. HAT outperforms its com- petition in case 1 and 2, reducing the relative error up to 33% and 32.5% for case 1 and 2 respectively. HAT becomes increasingly difficult to optimize in case 3 where it did not perform better than its competitors. iv Sammanfattning Inom fältet Natural Language Processing existerar det en uppsjö av algorit- mer för att skapa numeriska representationer av ord eller mindre delar. Dessa representationer fångar de semantiska egenskaperna av orden som för pro- blem på ordnivå direkt går att använda. Ett exempel på ett sådant problem är entitetsigenkänning. Problem på dokumentnivå kräver däremot speciella till- vägagångssätt för att möjliggöra skapandet av representationer med bestämd längd även när dokumentlängden varierar. Detta examensarbete utvecklar algoritmen Hierarchical Attention Trans- former (HAT), ett neuralt nätverk som tar vara på den hierarkiska strukturen hos dokument för att kombinera informationen på ordnivå till en representa- tion på dokumentnivå. Nätverket är helt och hållet baserat på uppmärksamhet vilket möjliggör utnyttjandet av information från hela sekvensen samt förstå- else av modellens slutsatser. HAT jämförs mot de för tillfället bäst presterande dokumentklassifice- ringsalgoritmerna i tre scenarier: Datasamlingar av dokument med medelläng- den (1) kortare än tre paragrafer, (2) längre än en hel sida och (3) längre än en hel sida där antalet dokument för träning är begränsat. HAT presterar bättre än konkurrenterna i fall 1 och 2, där felet minskades med upp till 33% och 32.5% för fall 1 respektive 2. Optimeringen av HAT ökade i komplexitet för fall 3, för vilket resultatet inte slog konkurrenterna. Contents 1 Introduction 1 1.1 Background . .1 1.2 Research question . .2 1.2.1 Delimitation . .3 1.2.2 Relevancy and business value . .3 1.3 Research methodology . .3 1.4 Ethics, sustainability and societal impact . .4 1.5 Outline . .5 2 Background 6 2.1 Neural networks . .6 2.1.1 Architecture . .6 2.2 Training techniques . .8 2.2.1 Dataset management . .8 2.2.2 Back-propagation . .9 2.2.3 Techniques for improving model performance . 10 2.3 Recurrent neural networks . 12 2.4 Transformer . 13 2.4.1 Overview . 13 2.4.2 Encoder . 14 2.5 Natural Language Processing . 16 2.5.1 Bag of words . 17 2.5.2 Tf-idf . 17 2.5.3 Language modelling . 18 2.6 Contextualized word embeddings . 22 2.6.1 Context vectors . 22 2.6.2 Bidirectional Encoder Representations from Transform- ers............................ 22 2.7 Our contribution . 25 v vi CONTENTS 2.7.1 Extracting features from BERT for documents . 25 2.7.2 The Hierarchical Attention Transformer . 26 3 Related works 29 3.1 Smooth Inverse Frequency . 29 3.2 Paragraph Vector . 31 3.3 Document Vector Through Corruption . 33 3.4 Hierarchical Attention Network . 34 3.5 Word Mover’s Embeddings . 35 3.5.1 Word Mover’s Distance . 35 3.5.2 Word Mover’s Embeddings . 36 4 Methods 38 4.1 Research questions . 38 4.2 Method . 38 4.2.1 Baseline comparison . 39 4.2.2 Classification of page-long documents . 41 4.2.3 Limiting the number of training instances . 41 4.3 Training procedure . 41 4.3.1 Hierarchical Attention Transformer . 42 4.3.2 Smooth Inverse Frequency . 42 4.3.3 Word Mover’s Embeddings . 43 4.3.4 Dataset management and model evaluation . 43 5 Results 44 5.1 Baseline comparison . 44 5.2 IMDB-long-n .......................... 44 6 Discussion 46 6.1 Baseline comparison . 46 6.1.1 BBC Sport . 46 6.1.2 OHSUMED . 47 6.2 IMDB-long-n .......................... 47 6.3 Validity discussion . 49 6.3.1 Internal validity . 49 6.3.2 External validity . 49 7 Conclusions 51 7.1 Future work . 51 CONTENTS vii Bibliography 53 A Attention examples 57 Chapter 1 Introduction Written text has for centuries enabled humanity to accumulate knowledge across generations. It also made communication over both distance and time possi- ble. It is reasonable to believe that keeping track of and categorizing these documents never posed a problem during the infancy of this technology. How- ever, it is with the inception of the Internet together with democratization of speech no surprise that this medium has exploded in volume. Categorizing even a fraction of this data would today be an unreasonable task for humans. This has led to many new and interesting problems within the field of Machine Learning (ML) where automatic document categorization is one of them. Enabling computers to make sense of the infinitely varying and complex structure of written language has been an ongoing research area for more than four decades [1]. Major breakthroughs have been achieved during the last few years [2, 3], enabling new use cases. This is partly due to the ever growing amount of available text data together with cross-pollination of ideas from different research fields within ML. Transfer learning techniques have been brought from the field of Computer Vision to Natural Language Processing (NLP). This has enabled word representation models of millions of parameters to be trained on datasets of billions of words [3] for use in new domains where data sparsity might otherwise prevent their success. 1.1 Background An early approach for enabling computers to understand text dates back to the mid 1950s. Representing a document during this time was achieved with a list of its word-counts [1]. This technique, which is called a Bag of words- model, has some glaring issues but can still produce useful document repre- 1 2 CHAPTER 1. INTRODUCTION sentations for classification and clustering. The two main shortcomings are its total neglect of word ordering and similarity between words, both of which are invaluable for complete language understanding. These two issues can be solved by training a model to find numerical representation of words which en- code their semantics. This approach result in word representations for similar words to be close in the high dimensional embedding space, thus preserving their semantic relatedness [2]. The growing interest in deep neural networks has enabled researchers to construct multi-million parameter models able to create even richer encodings for words. The performance of these models further close the gap to human performance in many different tasks, from question and answering to docu- ment classification [3]. A research area born from these achievements has focused on how to use this low-level information provided for each word to create meaningful repre- sentation of sentences, paragraphs or even documents. A lot of work has gone in to clever weighting schemes [4, 5, 6] and neural models, both linear [7] and recurrent structures [8], with the common goal of generating document em- beddings. There are, to the best of our knowledge, still particularly interesting techniques yet to be examined which is what this thesis will explore. 1.2 Research question This thesis will investigate a novel technique for creating document represen- tations from pre-trained embeddings of the words within. We will to examine this will answer the following research questions: • Is there merit in using state-of-the-art word embeddings together with a neural network model only relying on attention to create document embeddings for use in a classification problem? • How does the amount of training data in the case of longer documents affect the performance of our proposed model compared to current doc- ument embedding algorithms? The merit of our attention model can only be evaluated when compared with state-of-the-art document embedding algorithms. These will be described in later chapters. Further, we consider longer documents ones spanning more than 500 words. This usually is the amount that fits on a single A4-page, which closely coincides with the 512 token limit of Delvin et al.’s algorithm [3]. CHAPTER 1. INTRODUCTION 3 1.2.1 Delimitation This thesis will only evaluate the proposed model together with the embedding algorithm presented in [3] even though it is possible to use other embedding algorithms. The performance is expected to vary with other models, but this will not be studied. Further, this work is limited to study the effect of varying the amount of training data for one particular dataset.
Recommended publications
  • 2017 Illinois Track & Field Media Guide 2017 Illinois
    2017 ILLINOIS TRACK & FIELD MEDIA GUIDE 2017 ILLINOIS TRACK & FIELD MEDIA GUIDE 2017 TABLE OF CONTENTS 2017 SEASON PREVIEW 2017 Illinois Women’s Track & Field Roster �� � � � � � � � � � � � � � � � � � � � � � � � � � � �2 Track & Field Facilities � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �3 University of Illinois �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �4 2017 Qualifying Standards� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �5 THE COACHING STAFF Head Coach Ron Garner �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �6 Associate Head Coach Randy Gillon / Distance Coach Scott Jones� � � � � � � � � � �7 Volunteer Assistant Bryan Carrel / Support Staff �� � � � � � � � � � � � � � � � � � � � � � � �8 Illini Legendary Head Coach, Gary Winkler �� � � � � � � � � � � � � � � � � � � � � � � � � � � � �9 Women's Athletics Pioneer, Karol Kahrs �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � �10 THE FIGHTING ILLINI Kanide Bloch-Jones� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �11 Valerie Bobart � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �12 Nicole Choquette � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �13 Amanda Fox �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �14 Sara
    [Show full text]
  • 11TOIDC COL 21R1.QXD (Page 1)
    OID‰‰†KOID‰‰†OID‰‰†MOID‰‰†C The Times of India, New Delhi, Saturday,October 11, 2003 Ali presents book on his life Beckham opposed boycott Hewitt skips rest of season Raising his fists and striking a fighting pose, England football captain David Beckham begged Former World No. 1 Lleyton Hewitt has withdrawn Muhammad Ali greeted an adoring crowd at the teammate Gary Neville not to lead a boycott of the from all remaining ATP events this year in order to Frankfurt book fair on Thursday who cheered on side’s crucial Euro 2004 qualifier against Turkey in concentrate on the Davis Cup final. It means He- the former heavyweight champ as he presented protest the axing from the squad of Rio Ferdinand, witt will miss next week’s Madrid Masters, along a monumental book chronicling his life. The Sun newspaper reported on Friday. with Guillermo Coria and David Nalbandian. Jarno Trulli fastest in first qualifying for Japanese GP It just doesn’t seem to be real to me at the moment. Haydonnit! Super Matt — Kellie Hayden, wife of Matthew NZ need 18 to breathe Reuters SPORTS DIGEST blasts 380, eclipses Lara By Lionel Rodricks Reuters TIMES NEWS NETWORK Perth: Australia batsman Matthew Green cap when the record was bro- Hayden wiped Brian Lara’s individual ken.’’ Ahmedabad: The first Test Test-scoring record from the history The Australian was finally caught at match between India and books with a spectacular innings of 380 deep square-leg by Stuart Carlisle off New Zealand is tantalisingly in the first Test against Zimbabwe on Trevor Gripper for 380 shortly after tea.
    [Show full text]
  • Annual Progress Report July 1, 2003 – June 30, 2004 Cooperative Institute for Climate Science at Princeton University
    Annual Progress Report July 1, 2003 – June 30, 2004 Cooperative Institute for Climate Science at Princeton University CICS Jorge L. Sarmiento, Professor of Geosciences and Director Cooperative Institute for Climate Science Princeton University Forrestal Campus 300 Forrestal Road, Box CN 710 Princeton, New Jersey 08544-0710 Table of Contents Introduction 1 Research Themes Overview 2-4 Structure of the Joint Institute 5-6 Research Highlights Earth System Studies 7-10 Land Dynamics Ocean Dynamics Chemistry-Radiation-Climate Interactions Large-scale Atmospheric Dynamics Clouds and Convection Biogeochemistry 10-13 Ocean Biogeochemistry Land Processes Atmospheric Chemistry Earth System Model Coastal Processes 13 Paleoclimate 13-15 NOAA Funding Table 16 Project Reports 17-155 Publications 156-158 CICS Fellows 159 Personnel Information 160-163 CICS Projects 164-165 Cooperative Institute for Climate Science Princeton University Annual Report of Research Progress under Cooperative Agreement NA17RJ2612 During July 1, 2003 – June 30, 2004 Jorge L. Sarmiento, Director Introduction The Cooperative Institute for Climate Sciences (CICS) was founded in 2003 to foster research collaboration between Princeton University and the Geophysical Fluid Dynamics Laboratory (GFDL) of the National Oceanographic and Atmospheric Administration (NOAA). Its vision is to be a world leader in understanding and predicting climate and the co- evolution of society and the environment – integrating physical, chemical, biological, technological, economical, social, and ethical dimensions, and in educating the next generations to deal with the increasing complexity of these issues. CICS is built upon the strengths of Princeton University in biogeochemistry, physical oceanography, paleoclimate, hydrology, ecosystem ecology, climate change mitigation technology, economics, and policy; and GFDL in modeling the atmosphere, oceans, weather and climate.
    [Show full text]
  • Soccer/Football: Rugby: Cricket: Springbok Partisan / South African
    Sport in South Africa is practically a religion. The spectators come mostly to see football, rugby and cricket not only to watch the game but also to catch the atmosphere. South Africans are extremely passionate with sports. It occupies all medias and newspapers. Soccer/Football: Rugby: Football or soccer in South Africa is the This is a great passion in most popular game. It became the center of South Africa and an equivalent the non-racial sport movement. Its of a religion in the “whites” traditional supporter base is principally of South Africa. The country is in the black community. In 1991 it became usually good and competitive at the first sport to become unified and rugby, so if the national team, captivated the hearts of all South the “Springboks”, lose a game, Africans. The country’s national team, supporters are depressed for at affectionately nicknamed “Bafana least a week. They consider Bafana”(=“The Boys”) was qualified for the they must win every game except 1998 and 2002 World Cup Finals. against the New Zealand team, the “All Blacks”, because they have a great respect for them. The 1995 Rugby World Cup was played in South Africa and won Cricket: by the “Springboks”. The greatest triumph came when Makhaya Ntini became the first black player Nelson Mandela put on the number six jersey of Francois of the national team, “The Proteas”, in Pienaar (a South African 1998. South African cricket had a major player) to present the cup. It hitting in 2000 when the captain of the was a major period of the team, Hansie Cronje (a famous player of history of South African rugby cricket), was discovered to have accepted which had been for a long time money to lose matches.
    [Show full text]
  • Annual Report 2007 08 Index
    ANNUAL REPORT 2007 08 INDEX VISION & MISSION 2 PRESIDENT’S REPORT 4 CEO REPORT 6 AMATEUR CRICKET 12 WOMEN’S CRICKET 16 COACHING & HIGH PERFORMANCE 18 DOMESTIC PROFESSIONAL CRICKET 22 DOMESTIC CRICKET STATS 24 PROTEAS’ REPORT 26 SA INTERNATIONAL MILESTONES 28 2008 MUTUAL & FEDERAL SA CRICKET AWARDS 30 COMMERCIAL & MARKETING 32 CRICKET OPERATIONS 36 CORPORATE GOVERNANCE REPORT 40 GENERAL COUNCIL 42 BOARD OF DIRECTORS 43 TREASURER’S REPORT 44 FINANCIAL STATEMENTS CONSOLIDATED ANNUAL FINANCIAL STATEMENTS 46 UNITED CRICKET BOARD OF SOUTH AFRICA 62 CRICKET SOUTH AFRICA (PROPRIETARY) LIMITED 78 1 VISION & MISSION VISION Cricket South Africa’s vision is to make cricket a truly national sport of winners. This has two elements to it: • To ensure that cricket is supported by the majority of South Africans, and available to all who want to play it • To pursue excellence at all levels of the game MISSION As the governing body of cricket in South Africa, Cricket South Africa will be lead by: • Promoting and protecting the game and its unique spirit in the context of a democratic South Africa. • Basing our activities on fairness, which includes inclusivity and non-discrimination • Accepting South Africa’s diversity as a strength • Delivering outstanding, memorable events • Providing excellent service to Affiliates, Associates and Stakeholders • Optimising commercials rights and properties on behalf of its Affiliates and Associates • Implementing good governance based on King 2, and matching diligence, honesty and transparency to all our activities CODE
    [Show full text]
  • 30TOIDC COL 21R2.QXD (Page 1)
    OID‰‰‰†KOID‰‰‰†OID‰‰‰†MOID‰‰‰†C The Times of India, New Delhi, Wednesday,July 30, 2003 The last action hero Money’s on Lance No second thoughts Andre Agassi has thrown down the Five gone and Armstrong is still Soccer buffs, brace up for apocalypse gauntlet to Roddick $ Co. The tennis strong. That’s the verdict of three in two years time. Zinedine Zidane legend says he’s the last man standing members of Tour de France’s ‘club has reaffirmed his plan to quit after from a “tough generation” which includ- of five’. Eddy Merckx, Bernard playing for 24 more months. And ed Pete Sampras and Jim Courier and Hinault and Miguel Indurain believe he wants to get whatever laurels whose exploits will be tough to follow Armstrong can win the 6th next year more he can in this period Indian shuttlers disappoint in World Championships I just feel it’s the right time. I felt I was a bit tired Hussain just ‘lost it’, and stale. Four years is a long time in this job. ‘I was tired and stale’ —Nasser Hussain after relin- AP claim South Africans Birmingham: A “tired” Nasser Hussain who quishing England captaincy AFP stepped down as England Test captain after a four- Birminghham: Nasser Hussain SPORTS DIGEST year spell in charge here on Monday said: “I just was accused of “losing it” and try- feel it’s the right time. I felt I was a bit tired and ing to embarrass South Africa cap- AFP stale. Four years is a long time in this job,” Hussain tain Graeme Smith during his last said after the drawn first Test against South Africa match as England Test captain.
    [Show full text]
  • GULF TIMES Takes Fi Rst Motogp Win at Mugello SPORT Page 7
    FOOTBALL | Page 4 NHL | Page 6 Liverpool fans Bergeron line streets to powers Bruins honour Europe to 2-1 Stanley winners Cup series lead Monday, June 3, 2019 MOTORCYCLING Ramadan 29, 1440 AH Ducati’s Petrucci GULF TIMES takes fi rst MotoGP win at Mugello SPORT Page 7 2019 ICC CRICKET WORLD CUP Inspired Bangladesh add to South Africa’s agony Mushfiqur and Shakib in record partnership as Bangladesh post 330; Mustafizur scalps three to restrict Proteas to 309 Reuters But South Africa, who lost to London hosts England at the same venue on Thursday, were unable to keep the brakes on and continued to angladesh piled up their leak runs right to the end of the highest one-day in- innings. ternational total and Mahmudullah hit an unbeaten bowled superbly to up- 46 and Mosaddek Hossain scored Bset South Africa by 21 runs in the 26 runs off 20 balls as Bangladesh Cricket World Cup at The Oval smashed 54 runs off the last four yesterday. overs. It was a major scalp for Bang- “It was one of our top innings. ladesh, underlining their reputa- We’ve had some previous upsets tion as dangerous outsiders at the but this is a World Cup where we tournament, while South Africa really want to do well,” said man- slumped to their second succes- of-the-match Shakib. sive defeat. “It’s the way we wanted to Mushfi qur Rahim and Shakib Al start, you cannot get better than Hasan put together a record part- that.” nership as Bangladesh made 330 South Africa had gambled on for six after being put into bat.
    [Show full text]
  • Mahendra Singh Dhoni
    Mahendra Singh Dhoni From Wikipedia, the free encyclopedia Mahendra Singh Dhoni File:MS Dhoni1.jpg Personal information Full name Mahendra Singh Dhoni Born 7 July 1981 (age 29) Ranchi, Bihar (now inJharkhand), India Nickname Mahi Height 5 ft 9 in (1.75 m) Batting style Right-hand batsman Bowling style Right-hand medium Role Wicket-keeper, India captain International information National side India Test debut (cap 251) 2 December 2005 v Sri Lanka Last Test 9 October 2010 v Australia ODI debut (cap 158) 23 December 2004 v Bangladesh Last ODI 02 April 2011 v Sri Lanka ODI shirt no. 7 Domestic team information Years Team 1999/00 – 2004/05 Bihar 2004/05- Jharkhand 2008– Chennai Super Kings Career statistics Competition Test ODI FC LA Matches 54 185 95 241 Runs scored 2,925 5,958 5087 7,960 Batting average 40.06 48.08 37.40 47.95 100s/50s 4/20 7/37 7/34 13/48 Top score 148 183* 148 183* Balls bowled 12 12 42 39 Wickets 0 1 0 2 Bowling average – 14.00 - 18.00 5 wickets in innings - - - - 10 wickets in match - - - - Best bowling 0/1 - - 1/14 Catches/stumpings 148/25 180/60 256/44 247/75 Source: Cricinfo, 21 February 2011 Mahendra Singh Dhoni, pronunciation (help·info) (Hindi: महेनद िसंह धोनी ) (born July 7, 1981 in Ranchi, Bihar) (now in Jharkhand) is an Indian cricketer and the current captain of the Indian national cricket team. Initially recognized as an extravagantly flamboyant and destructive batsman, Dhoni has come to be regarded as one of the coolest heads to captain the Indian ODI side.
    [Show full text]
  • Judgment Mr Justice Bean
    Case No: HQ10D00267 Neutral Citation Number: [2012] EWHC 756 (QB) IN THE HIGH COURT OF JUSTICE QUEEN'S BENCH DIVISION Royal Courts of Justice Strand, London, WC2A 2LL Date: 26/03/2012 Before : MR JUSTICE BEAN - - - - - - - - - - - - - - - - - - - - - Between : CHRIS LANCE CAIRNS Claimant - and - LALIT MODI Defendant - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Andrew Caldecott QC and Ian Helme (instructed by Collyer-Bristow) for the Claimant Ronald Thwaites QC and Jonathan Price (instructed by Fladgate LLP) for the Defendant Hearing dates: 5-9, 12, 14 and 16 March 2012 - - - - - - - - - - - - - - - - - - - - - Judgment Mr Justice Bean: 1. The Claimant, who was born in 1970, is a well known New Zealand cricketer who won 62 Test caps and captained his country in 7 Test matches. When the shorter formats of the game are included he represented New Zealand on 267 occasions. He is one of only a handful of men who have reached the “all rounders’ double” of 200 wickets and 3000 runs in international cricket. His last appearance for New Zealand in a Test match was in June 2004 and in a one day international in January 2006. 2. The Defendant was formerly the Chairman and Commissioner of the Indian Premier League (IPL) and Vice-President of the Board of Cricketing Control for India (BCCI). He was suspended from these positions in April 2010 and removed from them in September 2010. The IPL operates Twenty20 competitions in India which attract an enormous following and have changed the face of cricket. At the time of the events in question Mr Modi was a very powerful figure in world cricket. He is now resident in England.
    [Show full text]
  • 2009-2010 CSA Annual Report and Financial Statement
    TOMORROW SHAPING 2 0 0 9 / 1 0 REPORT A N N UA L CRICKET SOUTH AFRICA ANNUAL REPORT 2 0 0 9 / 1 0 SHAPING TOMORROW Shaping Tomorrow We live in the most exciting era of sporting development. A time when full contact sport no longer holds centre stage. It is a passage of time when the art of sport is appreciated over the physicality of competition. Today, latent skills and blossoming talent has a place amongst our youth and the generations to come. It is now the subtle brilliance of deftness, the art of touch, mastery of stroke and pure strategic guile that has turned cricket into the sport of the future. Today cricket is the stage for mental agility and peak physical condition. It is purity of both mind and spirit that produces champions. The re-invention of cricket globally has rejuvenated a desire to master the ultimate game. A sense of camaraderie pursued by both men and women alike. It’s now a passion for gamesmanship, integrity, honesty and fair play. It is a game that can be embraced and played or supported by everyone. We can’t undo the past, but we can shape the future. We do what we do today in cricket, for what will happen TOMORROW. ConTEnTS 4 Vision and Mission 5 Ten Thrusts to Direct Transformation of Cricket in South Africa 6 President’s Message 8 CEO’s Report 18 Mapping the Way Forward 20 Reviving the CSA Presidential Plan 22 Black African Cricket on the Rise 24 KFC Mini Cricket gets Bigger and Better 26 Youth Cricket: Uplifting the Faces of Tomorrow 28 Under-19 Cricket gives Young Stars the Platform to Shine 30 First-Class
    [Show full text]
  • Philpott S. the Politics of Purity: Discourses of Deception and Integrity in Contemporary International Cricket
    Philpott S. The politics of purity: discourses of deception and integrity in contemporary international cricket. Third World Quarterly 2018, https://doi.org/10.1080/01436597.2018.1432348. Copyright: This is an Accepted Manuscript of an article published by Taylor & Francis in Third World Quarterly on 07/02/2018, available online: http://www.tandfonline.com/doi/full/10.1080/01436597.2018.1432348. DOI link to article: https://doi.org/10.1080/01436597.2018.1432348 Date deposited: 23/01/2018 Embargo release date: 07 August 2019 This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence Newcastle University ePrints - eprint.ncl.ac.uk 1 The politics of purity: discourses of deception and integrity in contemporary international cricket Introduction: The International Cricket Council (ICC) and civil prosecutions of three Pakistani cricketers and their fixer in October 2011 concluded a saga that had begun in the summer of 2010 with the three ensnared in a spot-fixing sting orchestrated by the News of the World newspaper. The Pakistani captain persuaded two of his bowlers to bowl illegitimate deliveries at precise times of the match while he batted out a specific over without scoring a run. Gamblers with inside knowledge bet profitably on these particular occurrences. This was a watershed moment in international cricket that saw corrupt players swiftly exposed unlike the earlier conviction of the former and now deceased South African captain Hansie Cronje whose corruption was uncovered much
    [Show full text]
  • Pakistan Take Charge of Decisive Test
    The Island, Tuesday 31st January, 2006 India poses biggest threat to hosts at Youth World Cup by Rex Clementine host all Sri Lanka’s first round Kaif beat the hosts to win the tions from the supporters put games. 2000 edition of the competi- the young players under Neighbour India poses the Two teams will qualify for tion at the SSC. additional pressure? biggest challenge to hosts Sri the quarter-finals of the com- Sri Lanka played India in “Conditions here are Lanka in the Under-19 Cricket petition from each group and the Afro-Asian Cup last year going to help us obviously World Cup that gets under- if Sri Lanka go through they in India and were beaten in and it’s an advantage. With way next week in Colombo. will probably meet either the the final, but apparently have expectations being so high, Sri Lanka’s captain Angelo West Indies, Australia or addressed key areas that did- the pressure can build, but Mathews, coach Sumithra South Africa. n’t go right for them in that looking positively it will help Warnakulasuriya and manag- “The Indian game is going tournament. us to do even better,” er Ashley de Silva addressed to be the toughest for us. They “During the Afro-Asia Mathews said. the media in Colombo, yester- are a good side, but having Cup fielding was our main The hosts are also the most day. said that, we’ll be approach- concern. We have done a lot prepared team in the compe- Sri Lanka are drawn in ing all games with the same of hard work towards rectify- tition having toured Pakistan, Group ‘C’ in the two week level of intensity,” Mathews ing the shortcomings,” Bangladesh and England.
    [Show full text]