The data engineer is responsible for designing, implementing, and maintaining the systems and processes necessary for efficient data collection, storage, and analysis. They work closely with product team, data labeling team and other stakeholders to understand data requirements and ensure that data pipelines are robust, scalable, and optimized for performance.
Data Pipeline Development:
Design, develop, and maintain scalable and efficient data pipelines to collect, process, and store large volumes of structured and unstructured data from various sources.
Data Integration:
Integrate data from disparate sources, including databases, APIs, and third-party applications, to create a unified view of the organization's data.
Data Modeling:
Design and implement data models to support analytical and reporting needs, ensuring data integrity, consistency, and accuracy.
Data Transformation:
Transform raw data into formats suitable for analysis and reporting, applying data cleansing, normalization, and enrichment techniques as necessary.
Performance Optimization:
Optimize data pipelines and database queries for performance, scalability, and reliability, minimizing processing times and resource utilization.
Data Quality Assurance:
Implement data quality checks and validation processes to ensure the accuracy, completeness, and consistency of data throughout its lifecycle.
Data Security:
Implement and enforce data security and privacy measures to protect sensitive information and ensure compliance with relevant regulations
Monitoring and Maintenance:
Monitor data pipelines and systems for performance issues, errors, and anomalies, and proactively address them to minimize downtime and data loss.
Documentation:
Document data engineering processes, workflows, and systems architecture to facilitate knowledge sharing and collaboration among team members.
Collaboration:
Collaborate with cross-functional teams, including product team, analysts, software engineers, and business stakeholders, to understand data requirements and deliver solutions that meet business objectives.
● Bachelor's degree in computer science, engineering, or a related field.
● Proven experience in data engineering or a related role, with expertise in designing and building data pipelines, data warehousing, and ETL processes.
● Proficiency in programming languages commonly used in data engineering, such as Python, Java, Scala, or SQL.
● Experience with distributed computing frameworks, such as Hadoop, Spark, or Kafka.
● Strong database skills, including SQL query optimization, database design, and performance tuning.
● Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and big data technologies (e.g., Hadoop, Spark, Hive, HBase).
● Excellent problem-solving and analytical skills, with a keen attention to detail.
● Strong communication and collaboration skills, with the ability to work effectively in a cross-functional team environment.
● Experience with data visualization tools (e.g., Tableau, Power BI) and machine learning frameworks (e.g., TensorFlow, PyTorch) is a plus.
● Ability to work independently and manage multiple tasks and priorities effectively.
● Willingness to learn and adapt to new technologies and methodologies in the rapidly evolving field of data engineering.
● Strong commitment to data quality, integrity, and security.
● Proactive and results-oriented approach to problem-solving and decision-making.
● Ability to thrive in a fast-paced, dynamic environment and drive continuous improvement through innovation and creativity.
At Lune, you will have the opportunity to work on cutting-edge technologies and collaborate witha talented team to solve complex challenges and drive meaningful impact for our clients. Weoffer competitive compensation, comprehensive benefits, and a supportive work environmentwhere your ideas and contributions are valued.