not_news.Rd
Based on a slightly amended version of the regular expression used to classify news, and non-news in: ``Exposure to ideologically diverse news and opinion on Facebook'' by Bakshy, Messing, and Adamic. Science. 2015.
not_news(url_list = NULL)
vector of URLs
data.frame with 3 columns: url, not_news, news
Amendment: sport rather than sports
URL containing any of the following words is classified as soft news: "sport|entertainment|arts|fashion|style|lifestyle|leisure|celeb|movie|music|gossip|food|travel|horoscope|weather|gadget"
URL containing any of following words is classified as hard news: "politi|usnews|world|national|state|elect|vote|govern|campaign|war|polic|econ|unemploy|racis|energy|abortion|educa|healthcare|immigration"
Note that it is based on patterns existing in a small set of domains. See paper for details.
if (FALSE) {
not_news("http://www.bbc.com/sport")
not_news(c("http://www.bbc.com/sport", "http://www.washingtontimes.com/news/politics/"))
}