Opportunity to make a positive impact
Flexible working options
Attractive salary & benefits
You will be responsible for :
Understand long-term and short-term business requirements to precision match it with the capabilities of different distributed storage and computing technologies from the plethora of options available in the ecosystem.
Create complex data processing pipelines
Design scalable implementations of the models developed by our Data Scientist.
Deploy data pipelines in production systems based on CICD practices
Create and maintain clear documentation on data models / schemas as well as transformation / validation rules
Troubleshoot and remediate data quality issues raised by pipeline alerts or downstream consumers
Skills Required :
1-3 years of overall industry experience in building and deploying large scale data processing pipelines in a production environment.
Experience building data pipelines and data-centric applications using distributed storage platforms such as HDFS, S3, NoSql databases (Hbase, Cassandra, etc) and distributed processing platforms such as Hadoop, Spark, Hive, Oozie, Airflow, etc.
Hands-on experience with MapR, Cloudera, Hortonworks, and / or Cloud (AWS EMR, Azure HDInsights, Qubole, etc) based Hadoop distributions
Practical experience working with well-known data engineering tools and platforms - Kafka, Spark, Hadoop
Solid understanding of Data Modelling, ML and AI concepts
Proficient in programming language : Python
Education : B.E, BTech, MTech, MS
Our Client provides Data and AdTech strategy consultation to leading internet websites and apps.