A Design Proposal of an Online Corpus-Driven Dictionary of Portuguese for University Students
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSIDADE DE LISBOA FACULDADE DE LETRAS A Design Proposal of an Online Corpus-Driven Dictionary of Portuguese for University Students Tanara Zingano Kuhn Orientadores: Prof.ª Doutora Margarita Maria Correia Ferreira Prof. Doutor Carlos Alberto Marques Gouveia Tese especialmente elaborada para obtenção do grau de Doutor no ramo de Linguística, na especialidade de Linguística Aplicada 2017 UNIVERSIDADE DE LISBOA FACULDADE DE LETRAS A Design Proposal of an Online Corpus-Driven Dictionary of Portuguese for University Students Tanara Zingano Kuhn Tese especialmente elaborada para obtenção do grau de Doutor no ramo de Linguística, na especialidade de Linguística Aplicada Orientadores: Prof.ª Doutora Margarita Maria Correia Ferreira Prof. Doutor Carlos Alberto Marques Gouveia Júri: Presidente: Prof.ª Doutora Maria Inês Pedrosa da Silva Duarte, Professora Catedrática e Membro do Conselho Científico da Faculdade de Letras da Universidade de Lisboa Vogais: - Doutora Ana Lúcia Frankenberg-Garcia, Reader School of Literature and Languages, University of Surrey, U.K.; - Doutor Iztok Kosem, Assistant with a Ph. D Faculty of Arts, University of Ljubljana, Slovenia; - Doutora Maria Joana de Almeida Vieira Santos, Professora Auxiliar Faculdade de Letras da Universidade de Coimbra; - Doutora Margarita Maria Correia Ferreira, Professora Auxiliar Faculdade de Letras da Universidade de Lisboa, orientadora; - Doutora Maria Amália Pereira Mendes, Investigadora Auxiliar Centro de Linguística da Faculdade de Letras da Universidade de Lisboa; - Doutora Sandra Maria Brito Pereira, Bolseira de Pós-Doutoramento (FCT) Centro de Linguística da Faculdade de Letras da Universidade de Lisboa. Coordenação de Aperfeiçoamento de Pessoal de Ensino Superior (Capes) Processo número 0973/13-0 2017 Acknowledgments I would like to thank my supervisor, Margarita Correia, for all her guidance, support and belief in me. In many difficult moments, she would give me a word of comfort and reassurance. Her encouragement was essential in this moment of my life; without her supervision and constant help, this thesis would not have been possible. I cannot thank her enough for everything that she has done for me. I wish to thank my co-supervisor, Carlos Gouveia, for our talks, especially for some apparently simple questions that would have me thinking for a long time. His course on academic writing contributed greatly to enriching my thesis. Moreover, his kindness and encouragement were very important in this phase of my life. I would like to express my most profound gratitude to Iztok Kosem. First and foremost, for receiving me at the University of Ljubljana for a Short-term Scientific Mission (ENeL/COST Action), during which I had the unparalleled opportunity to learn about the state-of-the-art methods, tools, and resources for e-lexicography. Moreover, his attention, true interest in my project, and valuable advice contributed greatly to the development of my PhD research. Undoubtedly, this was a crucial moment in my PhD and I am truly thankful for everything that he has done for me. Special thanks are due to José Pedro Ferreira for helping me with the computational part of the compilation of my corpus, for our enlightening conversations on corpora design and for always making himself available in case I needed help. I would like to thank Maria José Bocorny Finatto for introducing me to lexicography and making me fall in love with it. She first gave me the exciting opportunity to take part in her research and later co-supervised me when I was in the Netherlands. I am also deeply indebted to her for her insightful suggestions for my PhD project. Maria José has always been an inspiration and I cannot thank her enough for everything that she has done for me. Another fundamental person in my transitioning into lexicography was Paul Bogaards (in memoriam). His careful attention and interest in my project, together with his constant encouragement, were fundamental for my development in the area. Paul was not only an excellent supervisor but also an extremely kind person, who was always concerned about my well-being in the Netherlands. It was a great honour and a privilege to have been close to Paul. I am also deeply indebted to Robert Lew for his generosity. Besides kindly making his work available to me and patiently answering my (many) questions, he thoughtfully informed me about the European Network of e-Lexicography when it had just been launched. Given that this network gave me the opportunity to go to Slovenia for the scientific mission with Iztok, I cannot thank Robert enough for ultimately having made all this possible. I wish to present my warmest thanks to a very special person: Valdir do Nascimento Flores. Words are not enough to express how grateful I am for having had the invaluable opportunity to do research with him for so many years, and how lucky I am for having him as a friend. He has always been a role model for me and I would not be who I am if our lives had not crossed. I would also like to thank my dear friend Letícia Soares Bortolini for always being there for me. We have experienced a lot together and this time was no different. Her unconditional support and encouragement were essential for me to go through this challenging time of my life. I am also very thankful to my dear friend Yulya Mysyuk for all her kindness and care. She always made sure that I had some breaks and arranged the nicest things for us to do. Our talks were (and are) very dear to me – a conversation with her and stress would become much more manageable. I am truly thankful to my husband Miguel, who has supported me unconditionally during this entire period. His attention and encouragement were fundamental for making me keep going. I would also like to thank my parents, Egídio and Elisabeth, and my sister, Ananda, for their full support in everything I do. It was very reassuring to know they were always cheering for me. I also thank my mother-in-law, Tila, and father-in-law, Zé António, for their attention, support and care. Special thanks go to Andrew Swearingen for proofreading the manuscript. Last but not least, I would like to express my sincere appreciation to the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Capes-Brazil) for the PhD scholarship, to CELGA-ILTEC (University of Coimbra) for funding essential elements of my PhD research, such as scripts, to the European Cooperation for Science and Technology (COST) through the European Network of e-Lexicography (ENeL) Action for a Short-Term Scientific Mission, and to the University of Ljubljana for granting me licence to use the tools required for developing my PhD research. Abstract University students are expected to read and write academic texts as part of typical literacy practices in higher education settings. Hyland (2009, p. viii-ix) states that meeting these literacy demands involves “learning to use language in new ways”. In order to support the mastery of written academic Portuguese, the primary aim of this PhD research was to propose a design of an online corpus-driven dictionary of Portuguese for university students (DOPU) attending Portuguese-medium institutions, speakers of Brazilian Portuguese (BP) and European Portuguese (EP), either as a mother tongue or as an additional language. The semi-automated approach to dictionary-making (Gantar et al., 2016), which is the latest method for dictionary compilation and had never been employed for Portuguese, was tested as a means of provision of lexical content that would serve as a basis for compiling entries of DOPU. It consists of automatic extraction of data from the corpus and import into dictionary writing system, where lexicographers then analyse, validate and edit the information. Thus, evaluation of this method for designing DOPU was a secondary goal of this research. The procedure was performed on the Sketch Engine (Kilgarriff et al., 2004) corpus tool and the dictionary writing system used was iLex (Erlandsen, 2010). A number of new resources and tools were created especially for the extraction, given the unsuitability of the existing ones. These were: a 40 million-word corpus of academic texts (CoPEP), balanced between BP and EP and covering six areas of knowledge, a sketch grammar, and GDEX configurations for academic Portuguese. Evaluation of the adoption of the semi-automated approach in the context of the DOPU design indicated that although further development of these brand-new resources and tools, as well as the procedure itself, would greatly contribute to increasing the quality of DOPU’s lexical content, the extracted data can already be used as a basis for entry writing. The positive results of the experiment also suggest that this approach should be highly beneficial to other lexicographic projects of Portuguese as well. Keywords: academic Portuguese, automated lexicography, corpus, dictionary, tools development Resumo No ensino superior, espera-se que estudantes participem, em maior ou menor extensão, em atividades de leitura e escrita de textos que tipicamente circulam no contexto universitário, como artigos, livros, exames, ensaios, monografias, projetos, trabalhos de conclusão de curso, dissertações, teses, entre outros. Contudo, essas práticas costumam se apresentar como verdadeiros desafios aos alunos, que não estão familiarizados com esses novos gêneros discursivos. Conforme Hyland (2009, p. viii- ix), a condição para se ter sucesso nessas práticas é “aprender a usar a língua de novas maneiras”. A linguagem acadêmica é objeto de pesquisa há muitos anos, sendo especialmente desenvolvida no âmbito da língua inglesa. Se por um lado, durante um longo período todas as atenções estavam voltadas para o English for Academic Purposes (EAP) (inglês para fins acadêmicos), tendo em vista o incomparável apelo comercial dessa área, mais recentemente tem-se entendido que falantes de inglês como língua materna também precisam aprender inglês acadêmico, pois, como dito acima, trata-se de uma nova maneira de usar a língua, que os estudantes universitários desconhecem.