![]() > shakespeare = paste(shakespeare, collapse = " ") We will want to get rid of that! Using a text editor I checked to see how many lines were occupied with metadata and then removed them before concatenating all of the lines into a single long, long, long string. There seems to be some header and footer text. ![]() "An alternative method of locating eBooks:" "" "re-use it under the terms of the Project Gutenberg License included" "This eBook is for the use of anyone anywhere at no cost and with" "The Project Gutenberg EBook of The Complete Works of William Shakespeare, by" That’s quite a solid chunk of data: 124787 lines. Fortunately it is available from a number of sources. The first order of business was getting my hands on all that text. So, in the interests of bringing myself up to speed on the tm package, I thought I would apply it to the Complete Works of William Shakespeare and just see what falls out. ![]() I am starting a new project that will require some serious text mining. ![]()
0 Comments
Leave a Reply. |