10

Must Know Packages For a Successful Data Scientist

 3 years ago
source link: https://www.excelr.com/blog/data-science/must-know-packages-for-a-successful-data-scientist
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Must Know Packages For a Successful Data Scientist

221

Packages For Data Manipulation
Must know Packages for a successful Data Scientist
Packages for Data Manipulation

XLSX: To read and write excel files
Foreign: To read and write SAS,SPSS files
XML: To read and write XML File
JSON: To read and write Json files
Moments: To Find Skewness and Kurtosis
Httr: A set of useful tools for working with http connections
ggplot2: For visualixation purpose
lubridate: To work with date-spans, time-spans, date-time dd/mm/yy to yy/mm/dd
dplyr: Consistent and fast tool for working on R and modify the Data

Packages for Imputation

HotDeckimputation: To resolve missing Data

Yalmpute: Performs nearest neighbour-based imputation using one or more alternative approaches to process multivariate data

Mvnmle: Finds the maximum likelihood estimate of the mean vector and variance-covariance matrix for multivariate normal data with missing values.

Mice: Multiple Imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm

Lattice: A powerful, high-level data visualization system, emphasis on multivariate data. Sufficient for typical graphics needs, flexible to handle non-standard requirements.

Packages for Kmeans

Plyr: break a big problem down into pieces, operate on each piece and then put all the pieces back together.

Animation: Provides functions for animations in probability theory, mathematical, multivariate, nonparametric, computational statistics, sampling survey, linear models, time series, np data mining and machine learning.

kselection : selection of number of clusters via bootstrap

Doparallel : provides a parallel backend for the proper %dopar% function using the parallel package  

Cluster :  Finding groups in data

Package for KNN: 

Class : various functions for classification, including k nearest neighbour , learning vector quantization self-organizing maps 

Gmodels: various R programming tools for model fitting 

Package for linear regression : 

Lattice :  A powerful high level data visualisation system emphasis on multivariate data. sufficient for  typical graphics needs, flexible to handle most non-standards requirements 

Car : function and database to accompany
Cor2poor : used to find partial correlation 
MASS : function and database to support “ modern applied statistics with s”

Package for Naive Bayes:
e1071: functions for latent class analysis, fuzzy clustering . short time fourier transform , support vector machine, shortest path computation, bagged clustering , naive bayes classifier

gmodels : various programming tools for ,model fitting .

Packages for Text mining 

rjava: Low-level interface to java Vm similar to .c/.call.   This allows creation of objects, calling methods and accessing fields. 

tm :  This is a framework. for text mining applications within R

Snowballc:  Collapsing words to a common word to understand vocabulary. currently supportIng Danish, Dutch, English, Finnish, French, UMW, Flunganan, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish languages. 

Wordcloud : Describing words in a beautiful way. 
Rweka: collection of machine learning algorithms for data mining  tasks written in java, containing teals for data pre-processing, Visualization, association rules, classification, regression and Clustering. 

igraph: Routines for simple graph and network analysis. Handling large graphs and providing functions for generating random and  regular graphs, graph visualization, centralitymethod. 

qdap: Automates many of the tasks associated with quantitative discourse analysis of transcripts, parsing tools for preparing transcript data. 

Maptpx: Posterior maximizatIon for topic models (LDA)  In text analysis.

Packages for SVM/Neural: 

KernIab: Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction.  ‘KernIab,' includes Support Vector Machines,Spectral Clustering, KernIab PCA, Gaussian Process and OP solver .

Neuralnet : Training of neural networks using backpropagation, resilient backpropagation, resilient backpropagation allows flexible settings through custom-choice of error and action function. 

Packages for Twitter: 
TwitterR : It provides an interface to the Twitter web API.

Base64enc:  It provides tools for handling base64 encoding. This is more flexible than the orphaned base64. Pacbge.

Httpuv: It provides protocol support for handling HTTP and WebSocket requests directly from R. It Is a building block for other packages. 


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK