Amazon Data Scientist

Difficultyhard

The role of an Amazon Data Scientist

Amazon is the world's largest e-commerce company and also a major technology firm. Whether it is its dynamic e-commerce platform, intelligent virtual assistant Alexa or the reliable cloud computing service AWS, Amazon has an ever-growing presence in the digital sphere and is constantly in need of data scientists who can stretch the growth horizons of the company through ingenious data-driven decisions.

Data Scientist salary at Amazon:

  • Entry level salary :USD 171,000.
  • Senior positions   :USD 580,000. 
  • Median salary      :USD 300,000 with base component being USD 151,000, stock component being USD 128,000 and bonus being USD 21,000.

The role of a data scientist at Amazon is not fixed as such and depends on the specific team one is assigned to. Here are the different data science teams at Amazon:

  • AWS (Amazon Web Services).
  • Virtual assistant service Alexa.
  • Demand forecasting team in the Supply Chain Optimization Technologies (SCOT).
  • The NASCO Team (North America Supply Chain Organization).
  • Middle Mile Planning Research Optimization Science (mmPROS) team, and many more. 

Here are the roles and responsibilities in a little more detail:

  • Design, develop, evaluate, deploy and update data-driven models and analytical solutions for machine learning (ML) and natural language (NL) applications.
  • Develop cutting edge data pipelines, build accurate predictive models, and deploy automated software solutions to provide forecasting insights.
  • Research, design, and improve models with business impact in mind.
  • Demand Forecasting Team - Improve upon existing Demand Forecasting statistical or machine learning methodologies by developing new data sources, testing model enhancements, running computational experiments, and fine-tuning model parameters for new forecasting models.
  • SCOT Team - Analyse large amounts of data from different parts of the supply chain and their associated business functions. Improve upon existing machine learning methodologies by developing new data sources, developing and testing model enhancements.
  • AWS Team - Simplify and drive automation of the forecasting process by building new tools and onboarding existing ones from Amazon Retail or AWS. Support engineering teams to build tools and applications on Amazon's unique big data platform to efficiently generate and deploy insights into decision-making systems at AWS.

Skills/Qualifications preferred:

  • PhD in Artificial Intelligence, Computer Science, Statistics, Applied Math or a related field.
  • Previous experience in a ML, data scientist or optimization engineer role with a large technology company.
  • Experience in an operational environment developing, fast-prototyping, piloting and launching analytic products.
  • Ability to develop experimental and analytic plans for data modelling processes, use of strong baselines, ability to accurately determine cause and effect relations.
  • Experience in creating data-driven visualizations to describe an end-to-end system.
  • Experience in Statistical Software such as R, Weka, SAS, SPSS
  • Able to write SQL scripts for analysis and reporting (Redshift, SQL, MySQL).
  • Experience using one or more Python, R, Java, C++ VBA, MATLAB, programming languages.
  • Excellent written and verbal communication skills.

Interview Guide

Much like other big tech companies, Amazon follows a 3 stage interview process for selecting candidates for its Data Scientist role. The process comprises an initial phone screen by a hiring manager, followed by a technical screen, and concludes with an onsite interview. The onsite round is a full-day event and consists of five 1:1 interviews with one of Amazon's Data Scientists and HR manager.

Initial Screen

Overview

The initial phone interview is a 30-minute get-to-know-you interview conducted by the company's recruiter or hiring manager. The questions in the initial screen are typically based on your resume. The recruiter looks at your past work experience and your previous projects. Based on these, he/she assesses if your previous work experience aligns with the company's present role. The recruiter might also ask a few questions to gauge your motivation for the role and company. 

Technical Screen

Overview

After the initial phone screen, you will get to face a technical screen, also telephonic, conducted by one of Amazon's data scientists. The duration of the technical screen is usually between 30 minutes and an hour. This interview will test your skills in coding, statistics, and machine learning. As per experience shared by candidates on Glassdoor, there are at least two coding questions, one involving SQL and the other an algorithm code. Remember to think out loud while solving the questions to let the interviewer know your thought process. It is common for the interviewers to discuss your  “approach”, detailing how you got to the solution and why you use the steps you used.

Tips

  • Brush up your SQL skills.
  • Practice core statistics concepts and their applications.
  • Revise machine learning theory and its application to real-life problems, which may have a business dimension.
  • Follow the "think out loud" approach while solving problems.

Interview Questions

Most important interview questions:

Machine learning/AI

  • How does a logistic regression model know what the coefficients are?
  • Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex?
  • Is random weight assignment better than assigning the same weights to the units in the hidden layer?
  • Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?
  • What is Overfitting?
  • How would the change of prime membership fee affect the market?
  • Why is gradient checking important?
  • Describe Tree, SVM, Random forest and boosting. Talk about their advantages and disadvantages.

    SQL
  • Write an SQL code to explain the month-to-month user retention rate

Maths/Statistics

  • Explain the math behind the principal component analysis

Coding

  • If given an integer n and an array of numbers, give out the histogram divided into n bins.

Want to practice more such questions? Book a mock interview with our Amazon Data Scientist Experts here

Book a Mock Interview

Onsite Round

Overview

After clearing the technical phone screen, the recruiter will call you for an onsite interview at one of Amazon's campuses. The onsite loop comprises 5-6 interviews during the day. The interview panel consists of 2 members, one of them being an HR manager and the other, a data scientist at Amazon. The duration of the onsite round is about 6 hours.

Following are the different interviews you will have to face, in no particular order, as part of the onsite round:-

  • Machine learning and modelling interview with a data scientist
  • Data analysis and A/B testing interview
  • Interview with a data scientist to test your SQL skills
  • Algorithms and optimizations interview
  • Behavioural interview with an HR manager to assess your cultural and experiential fit for Amazon

Tips

  • Brush up on machine learning as it's a core focus.
  • Be aware of Amazon's products and services.
  • Learn Amazon's 14 leadership principles. Questions on this are asked at each stage of the interview.

Interview Questions

Most important questions asked in the onsite round:

  • How would you deal with unbalanced data where the ratio of positive and negative is huge?
  • Write a python function that displays the first n Fibonacci numbers.
  • Given a large string and a smaller string, write a code to determine if the smaller string can be generated from letters of the larger string.
  • The probability that an item is at location A is 0.6, and 0.8 at location B. What is the probability that the item would be found on the Amazon website?
  • Implement the union and intersection of two arrays (in an efficient way). Note that elements of the two given arrays may be repeated but cannot be repeated in union and intersection arrays.
  • You have two files in HDFS. One has a date range with two columns: Start date and End date. Another file has two columns with Date and number of visitors fields. Write Spark code that gives the date range with the most number of visitors.
  • Implement a circular queue using an array.
  • What’s the difference between Lasso and Ridge Regression?
  • When users are navigating through the Amazon website, they are performing many actions, clicking on buttons, doing searches, etc. What is the best way to model the front end of the website if their next action would be a purchase?
  • How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic ranges?
  • What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?
  • How would you modify a table with over a billion rows?
  • Give an instance of a project when you were on a short deadline and you finished it.

The Amazon Data Scientist interview becomes easy if you prioritise data science algorithms, machine learning, probability and database management and coding in your prep strategy. This interview guide should suffice for your prep if you focus on the areas mentioned.

To make your preparation foolproof, book a mock interview with our Amazon Data Scientist Experts

Book a mock interview

Frequently Asked Questions

What are the different stages of Data Scientist interview at Amazon?

There are 3 stages, namely- Initial Screen, Technical Screen, and Onsite Round

What is the duration of the Amazon Data Scientist interview ?

The Initial Screen is of 30 minutes duration while the Technical Screen lasts for anywhere between 30 minutes and an hour. The onsite round is the longest and can last for up to 6 hours.

Guides