Siftee.io (VC-backed data discovery start-up)
Siftee.io is a VC-backed company founded by ex-McKinsey and Salesforce leaders. Our vision is to be the first and only place analysts will come to find external data.
We help users search and download data (public data and premium), using our smart filters to match search results to their intended analysis, and we help premium data providers with earlier discovery in the search process, with a data quality score which distinguishes them vs. competition
We want to be the #1 data search platform within the next 5 years
We have a vision to allow our users to search at least 50,000 data sources on Siftee in the next 6 months, so our priority is improve and scale the data acquisition process. The chosen candidate will be working alongside the Chief Product Officer and Full Stack Developer to achieve this.
What you will do:
- Dive deep into the challenges our customers at Siftee face, identifying the best sources of data to answer their questions and serve their needs.
- Craft and refine our data acquisition tools, from building new web scrapers to ensuring the reliability and efficiency of our existing data pipelines.
- Develop and deploy scalable solutions to web and data challenges, utilizing both statistical methods and machine learning techniques.
The ideal profile:
- You're passionate about the entire data lifecycle, from discovery and scraping to cleaning and ingestion.
- Hands-on experience with web scraping pipelines, including crafting spiders, bypassing bot prevention strategies, and ensuring data integrity.
- Proficient with popular scraping tools and libraries like BeautifulSoup, Xpaths, Selenium, Puppeteer, and Splash.
- Adept at extracting data from a variety of formats including HTML, XML, REST, GraphQL, PDFs, and spreadsheets.
- Skilled in fortifying web scrapers against common obstacles like bot detection, site bans, CAPTCHA challenges, and proxy issues.
- Solid grounding in Object-Oriented Programming, SQL, and Django ORM basics.
Related Job Searches:
- Company:
Siftee.io - Designation:
Data Engineer Intern (Api and Data Acquisition Focus) - Profession:
IT / Information Technology - Industry:
Computer and IT - Location:
Central Area