• Install/setup/maintain multi-node Hadoop clusters
• Integrate applications, platforms, and network infrastructure
• Responsible for Capacity planning by matching with business and product needs
• Develop solutions that will provide monitoring, logging, and alerting capabilities. Use monitoring tools and metrics to analyze performance and characterize limitations and bottlenecks.
• Deploy the latest in CI/CD technologies and containerized solutions. Maintain and improve CI/CD infrastructure and pipelines to rapidly deliver software applications.
• Works closely with both the data engineers, data scientists as well as data architecture team.
• Follow an agile development methodology
• BS/MS degree in Computer Science, Engineering or related subject
• More than 4 years of experience on managing multi cluster Hadoop Distribution and ecosystems like Spark, HIVE, Oozie and hands on in using open source tool to manage large distributed systemsE
• Strong skill in Hadoop, Linux and shell scripting
• Experienced with Python or Java
• Experienced with messaging queue like Kafka or Rabbit
• Experienced with building alert and monitoring system
• Knowledge in DevOps and automation concepts
• Knowledge in Elastic search is a plus
• Strong analytical, trouble shooting and problem-solving skills
• Experience working in start-up environment or organizations with an agile culture
• Candidate with less experience may be considered for Platform Engineer role