4

A collection of Data Science Interview Questions Solved in Python and Spark   Ha...

 2 years ago
source link: https://blog.feelyou.top/posts/1984236213.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

A collection of Data Science Interview Questions Solved in Python and Spark   Hands-on Big Data and Machine Learning

2018-11-30

17

1.What are the most important machine learning techniques?

2.Why is it important to have a robust set of metrics for machine learning?

3.Why are Features extraction and engineering so important in machine learning?

5.What is a training set, a validation set, a test set and a gold set in supervised and unsupervised learning?

6.What is a Bias - Variance tradeoff?

7.What is a cross-validation and what is an overfitting?

8.Why are vectors and norms used in machine learning?

9.What are Numpy, Scipy and Spark essential datatypes?

10.Can you provide an example for Map and Reduce in Spark? (Let’s compute the Mean Square Error)

11.Can you provide examples for other computations in Spark?

12.How does Python interact with Spark

13.What is Spark support for Machine Learning?

14.How does Spark work in a parallel environment

15.What is the mean, the variance, and the covariance?

16.What are percentiles and quartiles?

17.Can you transform an XML file into Python Pandas?

18.Can you read HTML into Python Pandas?

19.Can you read JSON into Python Pandas?

20.Can you draw a function from Python?

21.Can you represent a graph in Python?

22.What is an Ipython notebook?

23.What is a convenient tool for performing data statistics?

24.How is it convenient to visualize data statistics

25.How to compute covariance and correlation matrices with pandas

26.Can you provide an example of connection to the Twitter API?

27.Can you provide an example of connection to the LinkedIn API?

28.Can you provide an example of connection to the Facebook API?

29.What is a TFxIDF?

30.What is “features hashing”? And why is it useful for BigData?

31.What is “continuous features binning”?

32.What is an LP normalization?

33.What is a Chi Square Selection?

34.What is mutual information and how can it be used for features selection?

35.What is a loss function, what are linear models, and what do we mean by regularization parameters in machine learning?

36.What is an odd ratio?

37.What is a sigmoid function and what is a logistic function?

38.What is a gradient descent?

39.What is a stochastic gradient descent?

40.What is a Linear Least Square Regression?

41.What are Lasso, Ridge, and ElasticNet regularizations?

42.What is a Logistic Regression?

43.What is a stepwise regression?

44.How to include nonlinear information into linear models

45.What is a Naïve Bayes classifier?

46.What is a Bernoulli and a Multivariate Naïve Bayes?

47.What is a Gaussian?

48.What is a Standard Scaling?

49.Why are statistical distributions important?

50.Can you compare your data with some distribution? What is a qq-plot?

51.What is a Gaussian Naïve Bayes?

52.What is another way to use Naïve Bayes with continuous data?

53.What is the Nearest Neighbor classification?

54.What are Support Vector Machines (SVM)?

55.What are SVM Kernel tricks?

56.What is K-Means Clustering?

57.Can you provide an example for Text Classification with Spark?

58.Where to go from here

59.Ultra-Quick introduction to Python

60.Ultra-Quick introduction to Probabilities

61.Ultra-Quick introduction to Matrices and Vectors

摘录来自: Antonio Gulli. “A collection of Data Science Interview Questions Solved in Python and Spark: BigData and Machine Learning in Python and Spark (A Collection of Programming Interview Questions Book 6)”。 iBooks.

最后更新时间:2021-04-22 19:41:40
转载请注明来源:http://blog.feelyou.top/posts/1984236213.html


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK