How would you approach designing a translation system that supports both text and speech in real-time?

Interview question asked to Machine Learning Engineers, Technical Program Managers, Engineering Managers and other roles interviewing at DailyHunt, Aurora, Nutanix and others: How would you approach designing a translation system that supports both text and speech in real-time?.