Must Know Packages For a Successful Data Scientist
source link: https://www.excelr.com/blog/data-science/must-know-packages-for-a-successful-data-scientist
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Must Know Packages For a Successful Data Scientist
221Packages For Data Manipulation
Must know Packages for a successful Data Scientist
Packages for Data Manipulation
XLSX: To read and write excel files
Foreign: To read and write SAS,SPSS files
XML: To read and write XML File
JSON: To read and write Json files
Moments: To Find Skewness and Kurtosis
Httr: A set of useful tools for working with http connections
ggplot2: For visualixation purpose
lubridate: To work with date-spans, time-spans, date-time dd/mm/yy to yy/mm/dd
dplyr: Consistent and fast tool for working on R and modify the Data
Packages for Imputation
HotDeckimputation: To resolve missing Data
Yalmpute: Performs nearest neighbour-based imputation using one or more alternative approaches to process multivariate data
Mvnmle: Finds the maximum likelihood estimate of the mean vector and variance-covariance matrix for multivariate normal data with missing values.
Mice: Multiple Imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm
Lattice: A powerful, high-level data visualization system, emphasis on multivariate data. Sufficient for typical graphics needs, flexible to handle non-standard requirements.
Packages for Kmeans
Plyr: break a big problem down into pieces, operate on each piece and then put all the pieces back together.
Animation: Provides functions for animations in probability theory, mathematical, multivariate, nonparametric, computational statistics, sampling survey, linear models, time series, np data mining and machine learning.
kselection : selection of number of clusters via bootstrap
Doparallel : provides a parallel backend for the proper %dopar% function using the parallel package
Cluster : Finding groups in data
Package for KNN:
Class : various functions for classification, including k nearest neighbour , learning vector quantization self-organizing maps
Gmodels: various R programming tools for model fitting
Package for linear regression :
Lattice : A powerful high level data visualisation system emphasis on multivariate data. sufficient for typical graphics needs, flexible to handle most non-standards requirements
Car : function and database to accompany
Cor2poor : used to find partial correlation
MASS : function and database to support “ modern applied statistics with s”
Package for Naive Bayes:
e1071: functions for latent class analysis, fuzzy clustering . short time fourier transform , support vector machine, shortest path computation, bagged clustering , naive bayes classifier
gmodels : various programming tools for ,model fitting .
Packages for Text mining
rjava: Low-level interface to java Vm similar to .c/.call. This allows creation of objects, calling methods and accessing fields.
tm : This is a framework. for text mining applications within R
Snowballc: Collapsing words to a common word to understand vocabulary. currently supportIng Danish, Dutch, English, Finnish, French, UMW, Flunganan, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish languages.
Wordcloud : Describing words in a beautiful way.
Rweka: collection of machine learning algorithms for data mining tasks written in java, containing teals for data pre-processing, Visualization, association rules, classification, regression and Clustering.
igraph: Routines for simple graph and network analysis. Handling large graphs and providing functions for generating random and regular graphs, graph visualization, centralitymethod.
qdap: Automates many of the tasks associated with quantitative discourse analysis of transcripts, parsing tools for preparing transcript data.
Maptpx: Posterior maximizatIon for topic models (LDA) In text analysis.
Packages for SVM/Neural:
KernIab: Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction. ‘KernIab,' includes Support Vector Machines,Spectral Clustering, KernIab PCA, Gaussian Process and OP solver .
Neuralnet : Training of neural networks using backpropagation, resilient backpropagation, resilient backpropagation allows flexible settings through custom-choice of error and action function.
Packages for Twitter:
TwitterR : It provides an interface to the Twitter web API.
Base64enc: It provides tools for handling base64 encoding. This is more flexible than the orphaned base64. Pacbge.
Httpuv: It provides protocol support for handling HTTP and WebSocket requests directly from R. It Is a building block for other packages.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK