Much like several other top companies, Amazon has data engineering roles as one of their most critical hires in which they’re expanding roles dramatically. Data engineering involves the collection and validation of data that can help the company meet its objectives. Data engineers face very unique challenges as to what kind of data must be selected, processed, and shaped, and to do this with competence makes it one of the most challenging jobs out there.
Data Engineers work alongside product managers, designers, data scientists, software engineers, and are an integral part of the team. They are responsible for extracting most of the data and transforming it into pipelines, which the rest of the team works upon. In simple words, a data engineer manages data and the data scientists explore it.
Some qualities of a Data Engineer at Amazon:
- Good command of database systems such as SQL and other programming languages is essential to be able to work on complex datasets.
- In-depth knowledge and experience with big data processing frameworks such as Hadoop or ApacheSpark, and analytical environments.
Additionally, Amazon has and operates by its 14 leadership skills, and hence looks for people who live these principles every day.
The Amazon Data Engineering interview can be broadly divided into 3 rounds.
After applying for the job, a screening round is conducted, which is a telephonic interview with a recruiter. Candidates then proceed to the second round, which is a technical phone interview, with questions focusing on SQL and Data Modeling. This is followed by the third (and final) round i.e. the onsite round, which typically consists of 3-4 interviews, focussing on SQL, Database Management, Data Warehousing, and behavioral topics.
This is the first telephonic screening interview.
What the interviewer will assess
The recruiter will assess your knowledge of SQL and Data Modelling. They may also ask you to solve basic coding problems in Python.
- You must be prepared for the “Tell me about yourself” starter.
- Have a solid understanding of SQL and Python.
- Keep your answers short and crisp.
- Tell me about yourself, why do you want to join this company.
- Give the basics of the window function in SQL.
- Write the required SQL queries for a given order table.
- Write the code for the Travelling Salesman problem and explain.
- Write a code for finding any two (or three) numbers in the given array whose sum is equal to x.
Note: Questions related to applications of Pandas in Python may also be asked.
In this round, there is usually just only one interview, and you’ll be talking to a Data Engineer. Your knowledge on the following topics will be tested -
- Data Warehousing, ETL
- Data Modeling
- Data Structures
You can expect some scenario-based questions on these topics. They may also inquire about previous experiences in projects.
What the interviewer will assess
- Fundamentals of SQL such as joins, subqueries, aggregations, filters, case statements, etc to solve the scenario-based questions.
- Your speed and efficiency in solving coding problems, and most importantly, your approach towards the problem.
- Don’t rush into coding, clarify all doubts beforehand.
- Think out loud so that the interviewer understands your approach. Sometimes they may give you hints indirectly, so you should use the opportunity to either reconsider the approach or explain why you tackled the problem that way.
- The interviewer wants to assess how you handle a problem, so it’s fine to check in with them on small syntax issues.
- Given two binary trees, determine whether they have the same inorder traversal.
- Given an order table, write the SQL queries for the desired output. For example - find the maximum frequency of a name in a group.
Note: Questions related to linked lists, stacks, queues, doubly linked lists, performance tuning of a program may also be asked.
Want to prepare for your technical phone screen with an Amazon Data Engineer?→ Book a Mock Interview now!
This is the hardest of the three rounds, as it focuses on problem-solving skills through scenario-based questions. The following interviews take place -
- 2 Technical interviews; 45-60 minutes each
- Bar Raiser Round
- HR Round
The technical interviews sometimes have a “mixed” aspect, with questions ranging from Data Warehousing definitions to real-life problems solved in past projects.
3 broad types of interviews happen in the onsite round:
1. Technical Interviews:
The topics tested during the technical interviews are -
- Data Warehousing
- Data Modeling
- Complex SQL
- Big Data Technology
What the interviewer will assess:
- Problem analysis: How well do you understand the problem, what use of the data you make, and the use cases that you solve.
- SQL: Fundamentals, as well as complex SQL, will be tested; how well you handle SQL queries
- Data Modeling: How well you understand the need for the data and how it supports the use cases. Further, they will assess how you execute this in the form of SQL queries.
- Data Modeling: Incorporate only the relevant details in the data model. Don’t include unnecessary details and complicate the model with tangents that aren’t core to the problem you’re solving.
- SQL: You must have a good command of SQL. Practice joins, aggregate functions, analytical functions, correlated subqueries, etc.
Sample interview questions:
- Explain the design schemas - star schema and snowflake schema (Data Warehousing)
- Write SQL queries for a given order table (this one is frequently asked)
- Design a data model to track products from vendors to the warehouse and then ultimately to customers.
- Create a data model for a multinational company like Amazon.
- Given a list of edges and nodes in a graph, write code to find the minimal canopy count.
- What is the difference between a correlated query and a nested query?
- What is a chasm trap?
- What is an index? Give the different types of indexes and explain and differentiate between them.
- Create required tables for eBay, define necessary relations, primary keys, and foreign keys.
- Design a simple OLTP architecture for RedBus.
- Write SQL queries to find groups having exactly three different tags.
Note: Practice questions related to designing a data pipeline given a specific reporting need.
2. Bar Raiser Round
For the bar raiser round, indirect questions will be asked regarding the 14 leadership skills.
This interview examines your ability to influence peers, collaborators, and stakeholders in cross-functional roles, as well as your previous experiences in doing so.
Sample interview questions:
- Tell us about a time when you worked in strict timelines?
- Have you ever had some kind of a conflict with your manager? How did you resolve it?
- Tell me about a time you made a mistake. How did you communicate your mistakes to the team?
- Tell me about a time you faced a crisis at work, how did you handle it?
Note: Don’t forget to incorporate the 14 leadership skills in your answers!
3. HR Round
For the HR round, questions may revolve around your previous projects and the use cases you have worked on in the past. Multiple HR rounds may be conducted to test different abilities. Group discussions may also take place.
- They may ask about projects you’ve previously worked on, so be prepared for that. Don’t forget to relate it to concepts of data engineering.
- Technical questions related to your projects, such as issues faced during the creation of pipelines, how you managed your databases, tables in the projects, etc. may be asked.