Front cover image for Language and computers

Language and computers

Markus Dickinson (Author), Chris Brew (Author), Detmar Meurers (Author)
"Language and Computers introduces students to the fundamentals of how computers are used to represent, process, and organize textual and spoken information. Concepts are grounded in real-world examples familiar to students' experiences of using language and computers in everyday life. A real-world introduction to the fundamentals of how computers process language, written specifically for the undergraduate audience, introducing key concepts from computational linguistics. Offers a comprehensive explanation of the problems computers face in handling natural language Covers a broad spectrum of language-related applications and issues, including major computer applications involving natural language and the social and ethical implications of these new developments The book focuses on real-world examples with which students can identify, using these to explore the technology and how it works Features "under-the-hood" sections that give greater detail on selected advanced topics, rendering the book appropriate for more advanced courses, or for independent study by the motivated reader"--Provided by publisher
eBook, English, 2013
Wiley-Blackwell, a John Wiley & Sons, Ltd., Publication, Chichester, West Sussex, 2013
1 online resource (232 pages)
9781118323168, 9781118324967, 9781118323182, 1118323165, 111832496X, 1118323181
781848600
What This Book Is About xi Overview for Instructors xiii Acknowledgments xvii 1 Prologue : Encoding Language on Computers 1 1.1 Where do we start? 1 1.1.1 Encoding language 2 1.2 Writing systems used for human languages 2 1.2.1 Alphabetic systems 3 1.2.2 Syllabic systems 6 1.2.3 Logographic writing systems 8 1.2.4 Systems with unusual realization 11 1.2.5 Relation to language 11 1.3 Encoding written language 12 1.3.1 Storing information on a computer 12 1.3.2 Using bytes to store characters 14 1.4 Encoding spoken language 17 1.4.1 The nature of speech 17 1.4.2 Articulatory properties 18 1.4.3 Acoustic properties 18 1.4.4 Measuring speech 20 Under the Hood 1: Reading a spectrogram 21 1.4.5 Relating written and spoken language 24 Under the Hood 2: Language modeling for automatic speech recognition 26 2 Writers’ Aids 33 2.1 Introduction 33 2.2 Kinds of spelling errors 34 2.2.1 Nonword errors 35 2.2.2 Real-word errors 37 2.3 Spell checkers 38 2.3.1 Nonword error detection 39 2.3.2 Isolated-word spelling correction 41 Under the Hood 3: Dynamic programming 44 2.4 Word correction in context 49 2.4.1 What is grammar? 50 Under the Hood 4: Complexity of languages 56 2.4.2 Techniques for correcting words in context 58 Under the Hood 5: Spell checking for web queries 62 2.5 Style checkers 64 3 Language Tutoring Systems 69 3.1 Learning a language 69 3.2 Computer-assisted language learning 71 3.3 Why make CALL tools aware of language? 73 3.4 What is involved in adding linguistic analysis? 76 3.4.1 Tokenization 76 3.4.2 Part-of-speech tagging 78 3.4.3 Beyond words 80 3.5 An example ICALL system: TAGARELA 81 3.6 Modeling the learner 83 4 Searching 91 4.1 Introduction 91 4.2 Searching through structured data 93 4.3 Searching through unstructured data 95 4.3.1 Information need 95 4.3.2 Evaluating search results 96 4.3.3 Example: Searching the web 97 4.3.4 How search engines work 100 Under the Hood 6: A brief tour of HTML 103 4.4 Searching semi-structured data with regular expressions 107 4.4.1 Syntax of regular expressions 108 4.4.2 Grep: An example of using regular expressions 110 Under the Hood 7: Finite-state automata 112 4.5 Searching text corpora 115 4.5.1 Why corpora? 116 4.5.2 Annotated language corpora 117 Under the Hood 8: Searching for linguistic patterns on the web 118 5 Classifying Documents : From Junk Mail Detection to Sentiment Classification 127 5.1 Automatic document classification 127 5.2 How computers “learn ” 129 5.2.1 Supervised learning 130 5.2.2 Unsupervised learning 131 5.3 Features and evidence 131 5.4 Application: Spam filtering 133 5.4.1 Base rates 135 5.4.2 Payoffs 139 5.4.3 Back to documents 139 5.5 Some types of document classifiers 140 5.5.1 The Naive Bayes classifier 140 Under the Hood 9: Naive Bayes 142 5.5.2 The perceptron 145 5.5.3 Which classifier to use 148 5.6 From classification algorithms to context of use 149 6 Dialog Systems 153 6.1 Computers that “converse”? 153 6.2 Why dialogs happen 155 6.3 Automating dialog 156 6.3.1 Getting started 156 6.3.2 Establishing a goal 157 6.3.3 Accepting the user ’ s goal 157 6.3.4 The caller plays her role 158 6.3.5 Giving the answer 158 6.3.6 Negotiating the end of the conversation 159 6.4 Conventions and framing expectations 159 6.4.1 Some framing expectations for games and sports 160 6.4.2 The framing expectations for dialogs 160 6.5 Properties of dialog 161 6.5.1 Dialog moves 161 6.5.2 Speech acts 162 6.5.3 Conversational maxims 164 6.6 Dialog systems and their tasks 166 6.7 Eliza 167 Under the Hood 10: How Eliza works 172 6.8 Spoken dialogs 174 6.9 How to evaluate a dialog system 175 6.10 Why is dialog important? 176 7 Machine Translation Systems 181 7.1 Computers that “translate”? 181 7.2 Applications of translation 183 7.2.1 Translation needs 183 7.2.2 What is machine translation really for? 184 7.3 Translating Shakespeare 185 7.4 The translation triangle 188 7.5 Translation and meaning 191 7.6 Words and meanings 193 7.6.1 Words and other languages 193 7.6.2 Synonyms and translation equivalents 194 7.7 Word alignment 194 7.8 IBM Model 1 198 Under the Hood 11: The noisy channel model 200 Under the Hood 12: Phrase-based statistical translation 204 7.9 Commercial automatic translation 205 7.9.1 Translating weather reports 205 7.9.2 Translation in the European Union 207 7.9.3 Prospects for translators 208 8 Epilogue : Impact of Language Technology 215 References 221 Concept Index 227