Save up to 30% on Elsevier print and eBooks with free shipping. No promo code needed.
Save up to 30% on print and eBooks.
Working with Text
Tools, Techniques and Approaches for Text Mining
1st Edition - July 12, 2016
Authors: Emma Tonkin, Gregory J.L Tourte
Language: English
Paperback ISBN:9781843347491
9 7 8 - 1 - 8 4 3 3 4 - 7 4 9 - 1
eBook ISBN:9781780634302
9 7 8 - 1 - 7 8 0 6 3 - 4 3 0 - 2
What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop co…Read more
Purchase options
LIMITED OFFER
Save 50% on book bundles
Immediately download your ebook while waiting for your print delivery. No promo code is needed.
What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop competences in text mining? Working with Text provides a series of cross-disciplinary perspectives on text mining and its applications. As text mining raises legal and ethical issues, the legal background of text mining and the responsibilities of the engineer are discussed in this book. Chapters provide an introduction to the use of the popular GATE text mining package with data drawn from social media, the use of text mining to support semantic search, the development of an authority system to support content tagging, and recent techniques in automatic language evaluation. Focused studies describe text mining on historical texts, automated indexing using constrained vocabularies, and the use of natural language processing to explore the climate science literature. Interviews are included that offer a glimpse into the real-life experience of working within commercial and academic text mining.
Introduces text analysis and text mining tools
Provides a comprehensive overview of costs and benefits
Introduces the topic, making it accessible to a general audience in a variety of fields, including examples from biology, chemistry, sociology, and criminology
Students, Researchers and Information professionals
Preface
Acknowledgements
Chapter 1: Working with Text
1.1 Introduction: Portraits of the Past
1.2 The Reading Robot
1.3 From Data to Text Mining
1.4 Definitions of Text Mining
1.5 Exploring the Disciplinary Neighbourhood
1.6 Prerequisites for Text Mining
1.7 Learning Minecraft: What Makes a Text Miner?
1.8 Contemporary Attitudes to Text Mining
1.9 Conclusions
Chapter 2: A Day at Work (with Text): A Brief Introduction
Abstract
2.1 Introduction
2.2 Encouraging an Interest in Text Mining
2.3 Legal and Ethical Aspects of Text Mining
2.4 Manual Annotation: Preparing for Evaluation
2.5 Common Text Mining Tasks
2.6 Basic Corpus Analysis
2.7 Preprocessing a Text
2.8 Extracting Features from a Text
2.9 Information Extraction
2.10 Applications of Indexing and Metadata Extraction
2.11 Extraction of Subjective Views
2.12 Build, Customise or Apply? Choosing an Appropriate Implementation
2.13 Evaluation
2.14 The Role of Visualisation in Text Mining
2.15 Visualisation Tools and Frameworks
2.16 Conclusions
Chapter 3: If You Find Yourself in a Hole, Stop Digging: Legal and Ethical Issues of Text/Data Mining in Research
Abstract
3.1 Introduction
3.2 Key Legal Issues in Data Mining
3.3 Ethics
3.4 Conclusions: Working on the Borders of Law and Ethics
Chapter 4: Responsible Content Mining
Abstract
4.1 Introduction to Content Mining
4.2 Obtaining Permission to Content Mine
4.3 Responsible Crawling
4.4 Publication of Results
4.5 Citation and Acknowledgement
4.6 Proposed Best Practise Guidelines for Content Mining
Chapter 5: Text Mining for Semantic Search in Europe PubMed Central Labs
Abstract
5.1 Introduction
5.2 Previous Work
5.3 Design and Implementation
5.4 Performance and Critique
5.5 Conclusions
5.6 Availability
Appendix: Resources Used for Indexing
Chapter 6: Extracting Information from Social Media with GATE
Abstract
Acknowledgements
6.1 Introduction
6.2 Social Media Streams: Characteristics, Challenges and Opportunities
6.3 The GATE Family of Text Mining Tools: An Overview
6.4 Information Extraction: An Overview
6.5 IE from Social Media with GATE
6.6 Conclusion and Future Work
Chapter 7: Newton: Building an Authority-Driven Company Tagging and Resolution System
Abstract
Acknowledgements
7.1 Introduction
7.2 Related Work
7.3 System Overview
7.4 Learning Company Name Links
7.5 System Development
7.6 Conclusions
Chapter 8: Automatic Language Identification
Abstract
Acknowledgements
8.1 Introduction
8.2 Historical Overview
8.3 Computational Techniques
8.4 Applications and Related Tasks
8.5 Conclusion
Chapter 9: User-Driven Text Mining of Historical Text
Abstract
Acknowledgements
9.1 Related Work on Text Mining Historical Documents
9.2 The Trading Consequences System
9.3 Data Collections
9.4 Challenges of Processing Digitised Historical Text
9.5 Text Mining Component
9.6 User-Driven Text Mining
9.7 Conclusion
Chapter 10: Automatic Text Indexing with SKOS Vocabularies in HIVE
Abstract
Acknowledgements
10.1 Introduction
10.2 Automatic Indexing with Machine Learning
10.3 Algorithms for Text Data Mining: KEA, KEA++ and MAUI
10.4 Algorithm Training and Workflow
10.5 The HIVE System
10.6 Text Mining for Documents Indexing Using SKOS Vocabularies in HIVE
10.7 Conclusions
Chapter 11: The PIMMS Project and Natural Language Processing for Climate Science: Extending the ChemicalTagger Natural Language Processing Tool with Climate Science Controlled Vocabularies
Abstract
Acknowledgements
11.1 Introduction
11.2 Methodology
11.3 Results
11.4 Overall Conclusions and Suggestions for Further Work
Chapter 12: Building Better Mousetraps: A Linguist in NLP
Chapter 13: Raúl Garreta, Co-founder of Tryolabs.com, Tells Emma Tonkin About the Journey from Software Engineering Graduate to Startup Entrepreneur
Appendix A: Resources for Text Mining
A.1 Introduction
A.2 Text Mining Software and Libraries
A.3 Text Mining Frameworks and Packages
A.4 Web Mining Packages
A.5 Data Mining Packages
A.6 A Selection of Components and Packages
A.7 Web Interfaces for Text Mining
A.8 Distribution and Scaling
Appendix B: Databases and Vocabularies
B.1 Sample Data Sets
B.2 Datasets primarily used for text categorization
Sources
Uses
B.3 Useful Tertiary Data Sets
Sources
Appendix C: Visualisation Tools and Resources
C.1 D3 – Data Driven Documents
C.2 Processing and Processing.js
C.3 Map Display
C.4 Command Line Visualisation Tools
C.5 Graphical Tools
C.6 Geographic Data Sets
Appendix D: Learning Opportunities
D.1 United Kingdom
D.2 Ireland
D.3 Sweden
D.4 France
D.5 United States
D.6 Short Courses, Training Courses and MOOCs
Index
No. of pages: 344
Language: English
Edition: 1
Published: July 12, 2016
Imprint: Chandos Publishing
Paperback ISBN: 9781843347491
eBook ISBN: 9781780634302
ET
Emma Tonkin
Emma Tonkin is a Senior Research Associate in the Faculty of Engineering at the University of Bristol. She has held positions in several universities, having previously worked in Digital Library research at UKOLN, University of Bath, and in the Department of Digital Humanities, King’s College, London. She holds a PhD in Computer Science from the University of Bristol. Her primary research interests include text and data mining, human computer interaction and the development of hybrid systems that combine human and machine classification
Affiliations and expertise
Senior Research Associate, Faculty of Engineering, University of Bristol, UK
GT
Gregory J.L Tourte
Gregory Tourte is Senior Research Associate at the University of Bristol in the School of Geographical Sciences where he started as a system administrator for the research group’s supercomputer and used this opportunity to research data management due to the large quantity of data being generated. He continues his work with deep time climate modelling within the Bristol Research Initiative for the Dynamic Global Environment (BRIDGE).
Affiliations and expertise
Senior Research Associate, School of Geographical Sciences, University of Bristol, UK