Summary

13.5. Summary

This chapter introduced techniques for working with text to clean and analyze data, including string manipulation, regular expressions, and document analysis. Text data has rich information about how people live, work, and think. But this data is also hard for computers to use—think about all the creative ways people manage to spell the same word. The techniques in this chapter let us correct typos, extract features from web server logs, and compare speeches. In our experience, even the basics of text analysis can enable all sorts of interesting analyses—a little bit goes a long way.