A rare opportunity to join a rapidly growing InsurTech firm financially backed by an industry leader. This position offers the opportunity to be part of a global success story that continues to evolve. You will have deep understanding of relational database technologies, 3+ years experience building large scale data pipelines with Hadoop/Spark. Strong skills in computer science fundamentals is advantageous, you will also have a driven and ambitious mindset.
- You will work with architects, business partners and business analysts to understand requirements, design and build effective solutions.
- Utilize the data engineering skills within and outside of the developing information ecosystem for discovery, analytics and data management
- You will be using Data wrangling techniques converting one "raw" form into another including data visualization, data aggregation, training a statistical model etc.
- Create different levels of abstractions of data depending on analytics needs.
- Hands on data preparation activities using the Hadoop technology stack
- Implement discovery solutions for high speed data ingestion.
- Work closely with the Data leadership team to perform complex analytics and data preparation tasks.
- Work with various relational and non-relational data sources with the target being Hadoop based repositories.
- Sourcing data from multiple applications, profiling, cleansing and conforming to create master data sets for analytics use.
- Design solutions for managing highly complex business rules within the Hadoop ecosystem.
- Performance tune data loads.
- Leverage visualanalytics tools to communicate results of data analysis.
- 3-5 years of solid experience in Big Data technologies a must.
- A computer science or related educational background
- Knowledge of Hadooop 2.0 ecosystems, HDFS, MapReduce, Hive, Pig, sqoop, Mahout, Spark etc. a must.
- You will have significant programming experience (with above technologies as well as Java, R and Python on Linux) a must.
- Knowledge of any commercial distribution like HortonWorks, Cloudera, MapR etc. a must.
- Excellent working knowledge of relational databases, HBase etc.
- Data visualization tool experience a plus.
- Natural Language Processing (NLP) skills with experience in Apache Solr, Python a plus
- Knowledge of High-Speed Data Ingestion, Real-Time Data Collection and Streaming is a plus.