Data Engineer (Hadoop) en San Pedro Garza García, Nuevo León para TECH MAHINDRA MEXICO CLOUD SERVICES, S. DE R.L. DE C.V. - Hireline México
Encuentra más vacantes similares


$ 50,000 a 60,000 MXN (Bruto)

Nuevo León

Empleado de tiempo completo

Nivel de Inglés: Nivel Avanzado

Data Engineer (Hadoop)

We are looking to hire a data analyst to join our data team. Who will take responsibility for managing master data set, developing reports, and troubleshooting data issues. To do well in this role is necessary a very fine eye for detail, experience as a data analyst, and a deep understanding of the popular data analysis tools and databases.


• You will be using Data wrangling techniques converting one "raw" form into another including data visualization, data aggregation, training a statistical model etc.
• Work with various relational and non-relational data sources with the target being Azure based SQL Data Warehouse & Cosmos DB repositories
• Clean, unify and organize messy and complex data sets for easy access and analysis
• Create different levels of abstractions of data depending on analytics needs
• Hands on data preparation activities using the Azure technology stack especially Azure Databricks is strongly preferred
• Implement discovery solutions for high speed data ingestion
• Work closely with the Data Science team to perform complex analytics and data preparation tasks
• Work with the Sr. Data Engineers on the team to develop APIs
• Sourcing data from multiple applications, profiling, cleansing and conforming to create master data sets for analytics use 
• Experience with Complex Data Parsing (Big Data Parser) and Natural Language Processing (NLP) Transforms on Azure a plus
• Design solutions for managing highly complex business rules within the Azure ecosystem

Minimum Requirements:

Mid to advanced level knowledge of Python and Pyspark is an absolute must.
Knowledge of Azure, Hadooop 2.0 ecosystems, HDFS, MapReduce, Hive, Pig, sqoop, Mahout, Spark etc. a must
Experience with Web Scraping frameworks (Scrapy or Beautiful Soup or similar)  
Significant programming experience (with above technologies as well as Java, R and Python on Linux) a must
• Extensive experience working with Data APIs (Working with RESTful endpoints and/or SOAP)
• Excellent working knowledge of relational databases, MySQL, Oracle etc.
• Experience with Complex Data Parsing (Big Data Parser) a must. Should have worked on XML, JSON and other custom Complex Data Parsing formats
• Knowledge of High-Speed Data Ingestion, Real-Time Data Collection and Streaming is a plus
• Bachelor’s in computer science or related educational background
3-5 years of solid experience in Big Data technologies a must