Machine Learning Interview Summary


Posted by Bruce Yang on July 13, 2018

机器学习Machine Learning是Data Scientist面试中决定性的环节,因为这是Data Scientist看家吃饭的技能。每一轮面试中都会涉及到不同类型的机器学习的问题。

Machine Learning面试一共分为4个维度

  1. Machine Learning的广度 (Breadth)
  2. Machine Learning的深度 (Depth)
  3. Machine Learning的经验 (Experience)
  4. Machine Learning的应用 (Application)

第一:Machine Learning的广度(面试的主要考点)


我总结了15个常用的Machine Learning算法

  1. Linear Regression
  2. Regression with Lasso
  3. Regression with Ridge
  4. Stepwise Regression
  5. Logistic Regression
  6. Naïve Bayes
  7. K-Nearest Neighbors
  8. K-means Clustering
  9. Decision Tree
  10. Random Forest
  11. Ada-Boost
  12. Gradient Boosting
  13. SVM (Support Vector Machine)
  14. PCA (Principal Component Analysis)
  15. Neural Networks

(如果申加州的IT公司,需要懂Deep Learning和Tensor-flow)

我总结了Machine Learning算法的5个General的问题简称The Big Five.

  1. What are the basic concepts? What problem does it solve?
  2. What are the assumptions?
  3. What are the steps of the algorithm?
  4. What is the cost function?
  5. What are the advantages/disadvantages?

第二:Machine Learning的深度(一般只有FLAG会考)



第三:Machine Learning的经验(面试的主要考点)


  1. 如何进行feature engineering?
  2. 如果数据量比feature量少怎么办?
  3. 如何解决imbalanced data classification的问题?
  4. 如果模型的performance没有达到预期应该怎么办?
  5. 怎么解决Missing Data?
  6. 如何Detect Outlier?怎么解决Outlier?

要多看一些大神的Blog的总结和data science相关的网站

第四:Machine Learning的应用(拿面试的关键)

简介:就是你做过的Machine Learning相关的Project。

只是掌握machine learning的知识点和推导公式那还是不够的,公司招人的目的是为了解决实际问题的,you must have solid project development experience!

对于一些质量高的Project,比如你在Kaggle上排名高或者得过奖的话,基本上大部分公司都会给你发面试的。一般面试中不会直接给你一堆data让你做个model,都是On-site之前给你一个Data Challenge用一周的时间做点model写点insights出来。

Here is a short list of common Data Scientist deliverables:

  • Prediction (predict a value based on inputs)
  • Classification (e.g., spam or not spam)
  • Recommendations (e.g., Amazon, Netflix, Spotify recommendations)
  • Pattern detection and grouping (e.g., classification without known classes)
  • Anomaly detection (e.g., fraud detection)
  • Recognition (image, text, audio, video, facial, …)
  • Actionable insights (via dashboards, reports, visualizations, …)
  • Automated processes and decision-making (e.g., credit card approval)
  • Scoring and ranking (e.g., FICO score)
  • Segmentation (e.g., demographic-based marketing)
  • Optimization (e.g., risk management)
  • Forecasts (e.g., sales and revenue)


  1. Regression类: Prediction Model-预测房价,预测股价
  2. Classification类: Image Classification-给你一堆图片让你classify到底哪些是猫咪那些是狗狗或者classify狗狗的种类,都是经典project
  3. Recommendation System类: 一般用collaborative filtering就可以,像Netflix做Movie Recommendation based on ratings。像Spotify做Music Recommendation base on historical streaming data
  4. NLP-Natural Language Process类:做一个垃圾邮件的识别model,基于News或者Twitter名人像Donald Trump之类的twitters去predict市场的走势。


An Introduction to Statistical Learning