Data Engineer (Hadoop)
We are looking to hire a data analyst to join our data team. Who will take responsibility for managing master data set, developing reports, and troubleshooting data issues. To do well in this role is necessary a very fine eye for detail, experience as a data analyst, and a deep understanding of the popular data analysis tools and databases.
• You will be using Data wrangling techniques converting one "raw" form into another including data visualization, data aggregation, training a statistical model etc.
• Work with various relational and non-relational data sources with the target being Azure based SQL Data Warehouse & Cosmos DB repositories
• Clean, unify and organize messy and complex data sets for easy access and analysis
• Create different levels of abstractions of data depending on analytics needs
• Hands on data preparation activities using the Azure technology stack especially Azure Databricks is strongly preferred
• Implement discovery solutions for high speed data ingestion
• Work closely with the Data Science team to perform complex analytics and data preparation tasks
• Work with the Sr. Data Engineers on the team to develop APIs
• Sourcing data from multiple applications, profiling, cleansing and conforming to create master data sets for analytics use
• Experience with Complex Data Parsing (Big Data Parser) and Natural Language Processing (NLP) Transforms on Azure a plus
• Design solutions for managing highly complex business rules within the Azure ecosystem
• Mid to advanced level knowledge of Python and Pyspark is an absolute must.
• Knowledge of Azure, Hadooop 2.0 ecosystems, HDFS, MapReduce, Hive, Pig, sqoop, Mahout, Spark etc. a must
• Experience with Web Scraping frameworks (Scrapy or Beautiful Soup or similar)
• Significant programming experience (with above technologies as well as Java, R and Python on Linux) a must
• Extensive experience working with Data APIs (Working with RESTful endpoints and/or SOAP)
• Excellent working knowledge of relational databases, MySQL, Oracle etc.
• Experience with Complex Data Parsing (Big Data Parser) a must. Should have worked on XML, JSON and other custom Complex Data Parsing formats
• Knowledge of High-Speed Data Ingestion, Real-Time Data Collection and Streaming is a plus
• Bachelor’s in computer science or related educational background
• 3-5 years of solid experience in Big Data technologies a must