Automated Text Summarization for the Enhancement of Public Services

Xingbang Liu, Janyl Jumadinova Allegheny College Department of Computer Science Meadville, PA 16335

Abstract 2019). AI in the public sector is still a young, emerging field of research and continued extensive research is needed to Natural language processing and machine learning algo- fully explore the full potential of AI in the public sector, rithms have been shown to be effective in a variety of appli- and leverage various AI technologies to address important cations. In this work, we contribute to the area of AI adoption in the public sector. We present an automated system problems/needs. A report by Harvard (Mehr, Ash, and Fel- that was used to process textual information, generate im- low 2017) specifies six types of government problems that portant keywords, and automatically summarize key elements could benefit from AI applications the most: resource allo- of the Meadville community statements. We also describe cation, large data sets, experts shortage, predictable scenar- the process of collaboration with My Meadville administra- ios, procedural and repetitive tasks, diverse data aggregation tors during the development of our system. My Meadville, and summarization. This paper makes a contribution in this a community initiative, supported by the city of Meadville direction: we present an intelligent system for knowledge conducted a large number of interviews with the residents of extraction from a large textual data set to address the impor- Meadville during the community events and transcribed these tant issue of public service enhancement. We also demon- interviews into textual data files. Their goal was to uncover strate a successful application of this AI technology in the the issues of importance to the Meadville residents in an at- tempt to enhance public services. Our AI system cleans and city of Meadville and discuss the process of engaging vari- pre-processes the interview data, then using machine learning ous stakeholders in the city government and the challenges algorithms it finds important keywords and key excerpts from that we have encountered during this process. each interview. It also provides searching functionality to find The city of Meadville located in in Northwestern Pennsyl- excerpts from relevant interviews based on specific keywords. vania has faced many of the same issues challenging small Our automated system allowed the city to save over 300 hours rural towns across our country, such as population decline, a of human labor that would have taken to read all interviews dwindling tax base and economic turbulence. In order to in- and highlight important points. Our findings are being used by My Meadville initiative to locate important information vestigate and combat some of these issues, in 2015 the City from the collected data set for ongoing community enhance- of Meadville applied for and received a grant from the Penn- ment projects, to highlight relevant community assets, and to sylvania Humanities Council and the Orton Family Foun- assist in identifying the steps to be taken based on the con- dation. The goals of the grant were to build a greater concerns and areas of improvement identified by the community nection with Meadville residents, develop a stronger sense members. of community identity, and a vision for the future rooted in what matters most to community members. My Meadville Introduction Heart and Soul initiative was formed to accomplish the aims identified in this work. Hosted by the Redevelopment Au- Following the successful implementation of Artificial In- thority of the City of Meadville, the My Meadville Heart and telligence (AI) technologies in the private sector compa- Soul initiative interviewed over 700 Meadville residents to nies, the government agencies have started to adopt AI tech- learn which places, issues and services residents care about niques for different applications such as health care (Sun and the most. The recorded interviews were then transcribed and Medaglia 2019), public safety (Kouziokas 2017), social wel- collected into a single data set. fare (Capgemini 2017), and education (Timms 2016). These AI applications have benefits of cost savings, increase of In this project, we first cleaned the data procured by My public employees’ productivity by reduction of their work- Meadville initiative and then used it in our knowledge ex- load, new employment opportunities, solutions to the re- traction system. Our goal was to develop a system that could source allocation problems and enhancement of citizens’ be used by the city of Meadville and relevant community satisfaction, but they also present challenges in their suc- members and organizations to get insight and context into cessful implementation and use (Wirtz, Weyerer, and Geyer the good and important work that is being done in Meadville, and uncover ways to improve public relations and services Copyright c 2019, Association for the Advancement of Artificial that are meaningful to Meadville residents. Using natural Intelligence (www.aaai.org). All rights reserved. language processing techniques, such as sentiment analysis and named entity recognizer, and machine learning al- where various organizations and city governmental agencies gorithms, our system processes community statements pro- are able to search the summaries for specific information. vided by My Meadville, finds important keywords, and then We then trained My Meadville administrator to use our sys- produces a summary of the key excerpts from the data. With tem with this searching functionality, which automatically the support of the City of Meadville and City Council, our generates a Markdown file on the web for easy access and findings are used as an aid by My Meadville initiative to de- interpretation of the search results. In 2019, this extended velop community value statements, highlight relevant com- functionality was utilized to gain insight into the previous munity assets and to develop an action plan based on the summer parks program in Meadville from the residents’ sto- concerns and areas of improvement identified by the com- ries and to use that information to revitalize and enhance munity members. An example of the value statement is as this program in Meadville. Additionally, a separate research follows: study into the museum creation used our system to uncover Children and Youth: We value youth-centered pro- the need and desire for such a project in Meadville. gramming and safe, accessible spaces that support To summarize, the contributions of our work presented in youth and prepare them for a fulfilling future. this article are as follows: • An automated system that can process textual infor- Supporting Data: mation, generate important keywords, and automatically Youth-Centered Programming: summarize key findings of the large amounts of text. “... Value diverse accessible opportunities for children • An additional search tool that allows users to search the to engage in and build community.” text for specific information and generates results in an “... Appreciates a community that connects resources to easily readable format. education to families to concerns about the whole child • A use case of our system by My Meadville initiative to aid through multiple systems ... ” in generating value statements and action plans and by the “... Cares about options for youth activities that keep city administrators to study the development of specific youth occupied, such as a skating rink or pool hall ...” projects in the city of Meadville. “... Appreciates teen-friendly activities that are afford- able, diverse, and purposeful ...” • A description of the process of collaboration with the pub- Safe and accessible spaces: lic officials and My Meadville volunteers to ensure trans- “... Values playgrounds and other safe places for kids parency and to build trust. to play ...” Related Work ‘... Treasures a community that offers a variety of safe AI in the public sector is a relatively young field but there inside and outside recreational activities for the chil- have been a number of articles demonstrating its use and dren and youth of Meadville.” outlining potential challenges. For example, in (Androut- “... Appreciates activities out of school for students to sopoulou et al. 2019), the authors present a new approach do so they do not get involved with criminal activities.” in the use of chatbots in the public sector to improve the “... Appreciates safe community events, places, and ac- communication between the government and citizens. The tivities for families that bring people together ...” presented approach is built using natural language process- “... Appreciates walkable options for youth engagement ing, machine learning and data mining technologies and it so that families don’t stress over transportation ...” develops a digital channel of communication between the Support Youth: government and citizens. This digital channel uses existing “... Values a community that appreciates the voices and data collected from documents containing legislation and di- input of our youth and nurtures their new ideas ...” rectives, structured data from government agencies’ opera- “... Appreciates the local opportunities for youth en- tional systems, and social media data to facilitate and pro- gagement and the ways in which they are connected mote information seeking and conducting of transactions. to each other ...” The presented approach was validated through a series of ap- “... Appreciates a community that values the voices plication cases with the cooperation of the three Greek gov- of children and youth and provides avenues for them ernment agencies. Given a number of interesting research to show their giftedness and to express themselves in articles such as the one described above, which demonstrate meaningful and beneficial ways ...” successful deployment of AI in the public sector, Wirtz et Some examples of the Action Plan items related to the al. (Wirtz, Weyerer, and Geyer 2019) analyzed and sum- action plan above include: marized scientific literature related to AI applications in the 1. Create a group dedicated to creating an empowered public sector. They categorize the research of AI in the pub- network of teens within the community. lic sector as follows: (1) AI government service, (2) work- 2. Sustain the summer parks program. ing and social environment influenced by AI, (3) public order and law related to AI, (4) AI ethics, and (5) AI gov- 3. Create a map of public parks. ernment policy. They also identify specific AI applications 4. Increase available transportation for youth. that are valuable for the public sector and present a Four- We also developed a searching functionality of our auto- AI-Challenges Model that incorporates main aspects of AI matically generated summary of the community statements, challenges. Valle-Cruz (Valle-Cruz et al. 2019) investigate the AI trends in the public sector by surveying 78 recent ument to make the summary of corresponding document. papers related to this area. Their findings indicate that only Extractive text summarization process identifies important normative and exploratory research articles have been pub- information (words or sentences) from the input text using lished so far and that many public policy challenges face statistical and linguistic features of the sentences and makes this research area. The authors, however, also outline various the summary of the corresponding document using the most benefits of AI application in public health, policies on cli- relevant sentences (Gupta and Lehal 2010). Similar to this mate change, public management and decision-making, im- technique, abstractive text summarization finds the impor- proving government-citizen interaction, personalization of tant sentences in the text but it uses new concepts and ex- services, analyzing large amounts of data, detecting abnor- pressions to describe it by generating a new shorter text that malities and patterns, and discovering new solutions through conveys the most important information from the original modeling and simulations. text document. There have also been a number of articles proposing some Supervised and unsupervised machine learning tech- theoretical frameworks for the successful adoption of AI in niques have been used for the text summarization task. In the public sector. Chen et al. (Chen, Ran, and Gao 2019) supervised learning based text summarization labeled data presents a four-stage model for AI development in the pub- sets are used for training. For example, in (Thomas, Bharti, lic sector in order to guide public administrators in its use and Babu 2016), the authors used a hidden Markov model to and navigate the impact AI would have on their organiza- automatically extract keywords for text summarization and tions. The authors present an application case of AI for de- used human annotated keyword data set to train their model. livering public services in local government in China. The However, it is often difficult to find enough labeled data for outcomes of their application of AI in the local government training, thus, Wong et al. (Wong, Wu, and Li 2008) de- in China could present the research community with the lon- veloped a semi-supervised learning method for co-training gitudinal study for AI in the public sector. In (Engin and Tre- by combining labeled and unlabeled data. They demonstrate leaven 2019), the authors present a taxonomy of government that their method is comparable to the supervised learning services to provide an overview of data science automation approach but it only requires about half of the labeling time being deployed by governments world-wide. They present cost. Unsupervised learning based text summarization does a review of the studies and projects across the world and not require any labeled data to be used in training. For ex- propose a technological framework on the development of ample, in (Garc´ıa-Hernandez´ et al. 2008), the authors used a the AI technologies to transform the public sector. Our work k-means clustering algorithm to generate groups of similar belongs to the category of research articles that describe sentences and the most representative sentence was further the successful development and deployment of AI technol- selected for the summary. ogy for the public sector use. Our developed AI technol- In our work, we use graph-based keyword, keyphrase ogy builds upon several existing natural language process- and sentence extraction technique and use an unsuper- ing techniques, such as lemmatization and the named entity vised learning based method for text summarization of My recognition, uses a number of machine learning algorithms Meadville community statements provided in the interview for sentiment analysis and word, phrase and sentence extrac- data. tion, and builds upon existing open-source projects, such as PyTextRank (Nathan 2016). AI System for My Meadville Interviews Automatic keyword and keyphrase extraction is the pro- Development Process cess of selecting words and phrases from the text that can project the core sentiment of the whole text automatically, After the data was collected by My Meadville we were ap- and it has become one of the fundamental steps in in- proached to find ways of automating the process of knowl- formation retrieval, text mining, and natural language pro- edge extraction. With over 700 text documents, the scarcity cessing applications (Siddiqi and Sharan 2015). In (Beliga, of trained volunteers and the lack of funding for professional Mestroviˇ c,´ and Martinciˇ c-Ip´ siˇ c´ 2015), the authors present help, they were seeking ways to make the knowledge ex- the survey for the task of the keyword extraction, concen- traction from this data feasible. In our initial discussions we trating on the graph-based methods. Graph-based represen- identified the goals that the City of Meadville had and pos- tation of text allows for the document to be modeled as a sible technological solutions for the data knowledge extrac- graph, where words are represented as nodes and their re- tion. We decided to follow an agile development method, lations are represented as edges. They find that graph-based with a feedback loop for My Meadville administrators at the keyword extraction techniques are domain and language in- design stage and various development stages. Over a period dependent, thus making them robust and easy to apply to of a year in 2018, we continued to meet with My Meadville knowledge extraction problems, such as text classification, initiative administrators and adjust our development given search, and summarization. their feedback. Once our system was fully implemented and Text summarization is a process of extracting the most tested we gave a demonstration to the My Meadville com- important features of a text and compiling it into a short text mittee. We then discussed various avenues for adoption and (Eduard and Lin 1998). Various approaches to text summa- extended use of our system. Following these discussions we rization have been proposed (Aggarwal and Zhai 2012). In a prepared and conducted training for the My Meadville com- query-based text summarization a specific portion of the text mittee and its volunteers on using our system and interpret- is utilized to extract the essential keyword from input doc- ing its results. Input files Pre-processing Processing Summarization .docx files

Convert DOCX Extract responses Rank all words, Combine excerpts files to TXT files in the input files phrases and sentences and keywords

Remove special Use sentiment Convert summary signs and new lines analysis to detect into Markdown from responses strong feelings format for easy access

Convert texts Find organi- to json objects zation names

Figure 1: Different stages of our system

My Meadville Interview Data relationships that make such work possible. In 2015, My Meadville initiative committee members re- Intelligent Text Extraction and Summarization cruited and trained dozens of volunteers to perform inter- Our AI system processes the textual My Meadville com- views and to transcribe them. Then, for a period of two munity statements, finds important keywords, and then pro- years the volunteers conducted interviews with hundreds duces a summary of the key excerpts from all data. It utilizes of Meadville residents at various community events, local several open source projects, such as Stanford Named Entity schools, businesses and organizations. These audio inter- Recognizer (NER), Scikit-learn Sentiment Analysis, and Py- views were manually transcribed into Word text files by the TextRank, and consists of multiple stages as identified in volunteers and compiled into a data set that was then used Figure 1. The text documents provided by My Meadville by our system. were saved as .docx files. We first automatically convert Upon reviewing the data set we found that the interview them to .txt files (extractdocx 2016). Then we go through transcriptions did not conform to a single format. For exam- a pre-processing stage, where the responses are extracted ple, in some transcriptions the responses were identified as from the interviews, and then cleaned by removing special such, and in others they were not, and we needed to read characters, signs, etc. Finally, the pre-processed interview the document to decide which text was the interviewer com- responses are converted to JSON files for use in the subse- ments and questions and which belongs to the interviewee. quent stages. Since the goal of My Meadville was to extract information During the processing stage the text is analyzed and rank- from the interviewees, our system only uses the responses in ing and sentiment analysis are conducted. The text rank the interviews. Therefore, before using the collected data set layer builds a word graph for voting on the importance of the in our system we manually looked through each document word based on the baseline approach outlined in (Mihalcea and whenever necessary made appropriate tags to identify and Tarau 2004). This graph model has the ability to extract the interviewee response so that it could be recognized by key phrases and rank the phrases and sentences. Ranking is our system. done through the idea of voting, where a vote is a connec- All participants whose voices are included in the final data tion of one node to another, that is when one node links to set have signed a waiver that allows My Meadville to use another node it is essentially voting for that node. The higher their stories to continue to strengthen the values and visions the number of votes for a node, the higher the importance of of our community. Also, all data is anonymous, where all the node. The model also takes the account the information identifying information in the interviews was removed. The on the importance of the vote itself, with the TextRank score hope of My Meadville was that the collected information of a node being calculated based on the votes it receives and can provide insight and context into the good and meaning- the score of the nodes voting for that node. ful work that is being done in Meadville, and that this data We build our system based on the implementation of the can be used as a tool to improve the efficiency, purpose, and TextRank method (Nathan 2016) with modifications. This Layer 1: naive Bayes, Bernoulli naive Bayes, logistic regression, lin- ear support vector clustering, and stochastic gradient descent classifier. These algorithms were used to predict the sentiment of the text. In this work we considered text important (high rank) if the outcome of sentiment learning indicated that the text has strong feelings (high score). The sentiment score was represented as a floating-point number and ranged Layer 2: from 0 (negative) to 1 (positive), where having a score closer to 0 or 1 indicated strong feelings. For each classifier we ob- tained the accuracy score from nltk and the results from the best algorithm were chosen, where the key phrases and sentences with strong positive or negative feelings were selected to be included in the summary. We also leveraged the Named Entity Recognition tech- Layer 3: nique before producing the summarization of the document. Stanford Named Entity Recognizer (NER) (Finkel, Grenager, and Manning 2005) uses an advanced statistical learning algorithm to extract named entities. The NER has three classes, including person, organization, and location entities. In our application, upon consultation with the My Figure 2: Example: TextRank layers Meadville administrators, only organization entities were detected. After text ranking, sentiment analysis and named entity implementation is written in Python and uses spaCy (Hon- recognition, all key phrases and sentences determined to be nibal and Montani 2017), NetworkX (NetworkX developer important by these three techniques are extracted for sum- team 2014), datasketch (datasketch 2018) tools. The impor- marization. tance calculation is performed in three layers. In the first Experimental Results layer, statistical parsing and tagging is performed on a document in a JSON format. An example output of this layer Data set 706 interviews can be seen from the top image in Figure 2. The second Average interview 9,161 words layer collects and normalizes the key phrases from the docu- Shortest interview 334 words ment produced by the previous step. Finally, during the third Longest interview 30,465 words layer a score for each sentence is calculated using the Jac- card distance between key phrases determined by TextRank Table 1: Information about the data set used in the experi- and each of its sentences. The middle and bottom images ments in Figure 2 show example outputs of the second and third layers. The implementation of our system was written in Python 3.6.7 and the experiments were run on Ubuntu Linux 4.15.0. The overview of the data set used in our experiments is The feature vectors built by the text ranking method pro- shown in Table 1. To evaluate the correctness of our sys- duced ranked key phrases and sentences as seen in Fig- tem, we have first tested it on five text documents randomly ure 2. This output was combined with the output from the selected from our data set. We carefully read and annotated sentiment analysis to determine top ranked sentences to be the text and then manually compared the output produced by used in extractive summarization of the document. Scikit- our system with our annotations to verify its accuracy. learn machine learning library (Scikit-learn 2018) was used After manual testing, we ran our system on a complete to perform sentiment analysis and to produce a sentiment data set, which produces a textual summary and a list of score. First, training on the data from Twitter and UCI ma- keywords for each interview document. Two examples of the chine learning database was done with the use of the pickle output produced by our system are shown in Figure 3, where module to store the trained model. The UCI data contains extracted summary and a list of keywords are included for sentiment labeled sentences from Amazon, Yelp, and IMDB each of the two documents. The top output corresponds to contained in positive.txt and negative.txt files. the interview with 1,274 words and the second interview In order to split the data into training and testing sets, we document contains 2,842 words. calculated the frequency distribution of each word for both Searching Functionality positive and negative sentences. The top 5,000 words were The summary and keyword information produced by our kept as features, then the features were pickled into one file system was very well received by My Meadville. In our and shuffled. The testing data included the last 10,000 fea- discussions with them we have estimated savings of over tures, whereas the training data included the previous 10,000 300 hours of human labor when reading condensed sum- features. Once the training was performed, six algorithms maries and keywords instead of the complete interviews dur- were used for testing, including naive Bayes, multinomial ing their work of compiling value statements and locating Keywords: family, fun things, friends, strong little com- rounding the keyword is reported in a Markdown file. After munity, fun games, baseball, taxes, more jobs, Baldwin the output of all documents has been checked, the Mark- Reynolds house, park, history, houses down file is uploaded to the GitHub repository, which can Summary: I love all the fun things you can do with friends be viewed by the relevant parties. An example of an auto- and family and I feel it’s like a strong little community, we matically generated and uploaded search result file is shown all work together and have fun . What matters to me most in Figure 4, where partial results for the “park” keyword are is my friends here. My favorite memory about living in shown. Overall, the searching functionality added to our sys- Meadville is probably playing baseball and doing all these tem was proven to be very useful and it allowed the city and fun games at baseball. Some stuff to make living here eas- the local community to utilize the data for specific projects ier would be cutting taxes, creating more jobs , and making as needed. it so we’re more of a strong economy. All the history we have is important , because our history dates clear back to Conclusion and Future Directions the 1700s . And there ’s just so many things you can do In this paper we describe an intelligent text summarization like the Baldwin Reynolds House, and even Diamond Park system that also identifies important keywords and allows to is history. My one wish for Meadville would be to make all search results for use-specified keywords. We also present an the houses look nice. application of this system to the interview data collected by My Meadville initiative, which was supported by the city of Keywords: community, larger town, great education sys- Meadville and the City Council and hosted by the Redevel- tem, crawford county fair, family-friendly town, great fam- opment Authority of the City of Meadville. Our experimen- ily atmosphere, economy, jobs, newer buildings, good hon- tal results were run on over 700 transcribed interviews and est living, historical places the output containing an extracted summary and keywords Summary: What I love most about the city of Meadville is for each interview were shared with My Meadville commit- that it has all of the attractions and items of a much larger tee, which they used in their work of developing community town, but it has a very small-town, family-friendly oriented value statements, followed by action plan items. The com- community . What matters most to me is having a great munity work on talking to the Meadville residents continues education system for my children. My favorite memory and more documents are added as the new interviews are would probably have to be the Crawford County Fair. Hav- transcribed. The searching functionality of our system has ing a great family atmosphere and a family-friendly town been utilized in several community projects, including the was important to us as we raised our family. Our families study on the enhancement of the city’s summer parks pro- grew up here in this area and we ’re happy to be around gram and the feasibility study on the creation of a Commu- them, and stay in a great community . Seeing a good, solid nity Museum of Science, Industry, and Culture. base in the economy would be something that would make us stay here. So between a strong education system and We encountered a few challenges in our work of AI sys- also jobs that people can have that people can make a good tem application in the public sphere. First of all, we came honest living off of will keep people here, and keep the into the project after the data was collected and hence could town thriving . What draws us here and keeps us here is all not provide input into the transcription format of the inter- the items of a large town but in a much smaller community views and the importance of consistency across transcrip- oriented town. I would probably go with newer buildings tions. Since the interviews were conducted and transcribed versus all the effort and time and money that it would take by dozens of different volunteers, the structure and the for- to restore a lot of these historical places. mat of the interviews and transcriptions varied greatly. We spent a number of hours manually parsing through interview documents and editing documents that did not identify the Figure 3: Example output produced by our system for two interviewee responses easily. Secondly, we found that de- interview documents. ploying the system for the use by My Meadville committee was challenging. My Meadville administrators and most of its volunteers had no computing training and were uncom- supporting data. During one of our feedback discussions, fortable setting up and running programs. Therefore, they My Meadville administrators requested a possibility of im- relied on us to gather the summary and keywords results. We plementing the searching functionality to locate specific in- did successfully train My Meadville administrators in using formation in the summarized output quickly. This was espe- the searching functionality as that implementation did not cially important because of a number of new research and rely on many libraries. We also successfully trained them to community projects in the city that could benefit from the use GitHub, where we shared the source code, output files, data. etc. through private repositories. We built a supplemental searching tool to accompany our In the future, we would like to extend our system to a text extraction system that allows the user to search for spe- container-based set up, where all dependencies will be in- cific keywords. This tool goes through each output (sum- cluded for the user in a container and they will not need to mary and keywords) produced by our system and tries to download all the dependencies before using our system. We match the keyword specified by the user with one of the would also like to create a web-based tool for our search- keywords identified by our system. If there is a match, the ing functionality instead of having an executable file that the keyword, the interview document name, and the text sur- users have to click on. Finally, we will continue to coordi- Figure 4: An example of the searching output. nate with the City of Meadville in different uses of our sys- embeddings, convolutional neural networks and incremental tem and its further development in order to accomplish their parsing. To appear. goals of public service enhancements in Meadville. [Kouziokas 2017] Kouziokas, G. N. 2017. The application of References artificial intelligence in public administration for forecasting high crime risk transportation areas in urban environment. [Aggarwal and Zhai 2012] Aggarwal, C. C., and Zhai, C. Transportation research procedia 24:467–473. 2012. Mining text data. Springer Science & Business Media. [Mehr, Ash, and Fellow 2017] Mehr, H.; Ash, H.; and Fel- [Androutsopoulou et al. 2019] Androutsopoulou, A.; Kara- low, D. 2017. Artificial intelligence for citizen services capilidis, N.; Loukis, E.; and Charalabidis, Y. 2019. Trans- and government. Ash Cent. Democr. Gov. Innov. Harvard forming the communication between citizens and govern- Kennedy Sch., no. August 1–12. ment through ai-guided chatbots. Government Information [Mihalcea and Tarau 2004] Mihalcea, R., and Tarau, P. 2004. Quarterly 36(2):358–367. Textrank: Bringing order into text. In Proceedings of the [Beliga, Mestroviˇ c,´ and Martinciˇ c-Ip´ siˇ c´ 2015] Beliga, 2004 conference on empirical methods in natural language S.; Mestroviˇ c,´ A.; and Martinciˇ c-Ip´ siˇ c,´ S. 2015. An processing. overview of graph-based keyword extraction methods and [Nathan 2016] Nathan, P. 2016. Pytextrank, a python im- approaches. Journal of information and organizational plementation of textrank for text document nlp parsing sciences 39(1):1–20. and summarization. https://github.com/ceteri/ [Capgemini 2017] Capgemini. 2017. Unleashing the pytextrank/. potential of artificial intelligence in the public sector. [NetworkX developer team 2014] NetworkX developer https://www.capgemini.com/consulting/ team. 2014. Networkx. wp-content/uploads/sites/30/2017/10/ ai-in-public-sector.pdf. [Scikit-learn 2018] Scikit-learn. 2018. Cre- ating a module for sentiment analysis with [Chen, Ran, and Gao 2019] Chen, T.; Ran, L.; and Gao, X. nltk. https://pythonprogramming.net/ 2019. Ai innovation for advancing public service: The case sentiment-analysis-module-nltk-tutorial/. of china’s first administrative approval bureau. In 20th An- [Siddiqi and Sharan 2015] Siddiqi, S., and Sharan, A. 2015. nual International Conference on Digital Government Re- Keyword and keyphrase extraction techniques: a literature search, 100–108. ACM. review. International Journal of Computer Applications [datasketch 2018] datasketch. 2018. datasketch: Big 109(2). https://github.com/ekzhu/ data looks small. [Sun and Medaglia 2019] Sun, T. Q., and Medaglia, R. 2019. datasketch . Mapping the challenges of artificial intelligence in the pub- [Eduard and Lin 1998] Eduard, H., and Lin, C.-Y. 1998. Au- lic sector: Evidence from public healthcare. Government tomated text summarization and the summarist system. In Information Quarterly 36(2):368–383. Proceedings of a workshop on held at Baltimore. [Thomas, Bharti, and Babu 2016] Thomas, J. R.; Bharti, [Engin and Treleaven 2019] Engin, Z., and Treleaven, P. S. K.; and Babu, K. S. 2016. Automatic keyword extraction 2019. Algorithmic government: Automating public services for text summarization in e-newspapers. In Proceedings of and supporting civil servants in using data science technolo- the international conference on informatics and analytics, gies. The Computer Journal 62(3):448–460. 86. ACM. [extractdocx 2016] extractdocx. 2016. Simple function to [Timms 2016] Timms, M. J. 2016. Letting artificial intelli- extract text from ms xml word document (.docx) with- gence in education out of the box: educational cobots and out any dependencies. https://gist.github.com/ smart classrooms. International Journal of Artificial Intelli- etienned/7539105. Accessed: 2018-05-30. gence in Education 26(2):701–712. [Finkel, Grenager, and Manning 2005] Finkel, J. R.; [Valle-Cruz et al. 2019] Valle-Cruz, D.; Alejandro Grenager, T.; and Manning, C. 2005. Incorporating Ruvalcaba-Gomez, E.; Sandoval-Almazan, R.; and Ig- non-local information into information extraction systems nacio Criado, J. 2019. A review of artificial intelligence in by gibbs sampling. Proceedings of the 43rd Annual Meeting government and its potential from a public policy perspec- on Association for Computational Linguistics 363370. tive. In 20th Annual International Conference on Digital [Garc´ıa-Hernandez´ et al. 2008] Garc´ıa-Hernandez,´ R. A.; Government Research, 91–99. ACM. Montiel, R.; Ledeneva, Y.; Rendon,´ E.; Gelbukh, A.; and [Wirtz, Weyerer, and Geyer 2019] Wirtz, B. W.; Weyerer, Cruz, R. 2008. Text summarization by sentence extraction J. C.; and Geyer, C. 2019. Artificial intelligence and the using unsupervised learning. In Mexican International Con- public sector - applications and challenges. International ference on Artificial Intelligence, 133–143. Springer. Journal of Public Administration 42(7):596–615. [Gupta and Lehal 2010] Gupta, V., and Lehal, G. S. 2010. A [Wong, Wu, and Li 2008] Wong, K.-F.; Wu, M.; and Li, W. survey of text summarization extractive techniques. Journal 2008. Extractive summarization using supervised and semi- of emerging technologies in web intelligence 2(3):258–268. supervised learning. In Proceedings of the 22nd Interna- tional Conference on Computational Linguistics-Volume 1 [Honnibal and Montani 2017] Honnibal, M., and Montani, I. , 2017. spaCy 2: Natural language understanding with Bloom 985–992. Association for Computational Linguistics.