rdomains.Rmd
The latest development version of the package will always be on GitHub. To install the package from GitHub and to load the installed package:
#library(devtools)
install_github("themains/rdomains")
To install the package from CRAN, type in:
install.packages("rdomains")
Next, load the package:
library(rdomains)
To get category of the content from shallalist, first download the latest file using:
And then, get the category using:
shalla_cat("http://www.google.com")
## domain_name shalla_category
## 1 google.com searchengines
To get category of the content from DMOZ, first download the archived parsed CSV file using:
And then, get the category using:
dmoz_cat("http://www.google.com")
Probability that Domain Hosts Adult Content Based on features of Domain Name and Suffix alone:
adult_ml1_cat("http://www.google.com")
## domain_name category
## 1 google.com 0.3133728
Start by getting the API key from virustotal.
Get virustotal category by running:
virustotal_cat("http://www.google.com")
## domain bitdefender dr_web alexa google websense trendmicro
## 1 http://www.google.com searchengines chats google searchengines advertisements search engines portals
Get the content category of a domain according to McAfee (Trusted):
trusted_cat("http://www.google.com")
## url status categorization reputation
## 2 http://www.google.com Categorized URL - Search Engines Minimal Risk
To get the category of content from Amazon (Alexa) (which provides it via DMOZ), start by getting credentials from https://aws.amazon.com/. Next, set the environment variables:
Sys.setenv("AWS_ACCESS_KEY_ID", "key_id")
Sys.getenv("AWS_SECRET_ACCESS_KEY", "secret_key")
Then run,
alexa_cat(domain="http://www.google.com")[1,]
## Title AbsolutePath
## 1 Search Engines/Google Top/Computers/Internet/Searching/Search_Engines/Google