Data Engineering Intern

Role: Data Engineering Intern at Symbium

Reports To: Engineering Lead

Revolutionizing Public Access to Information about Properties in the US

At Symbium, we make it easy for the masses to interact with and understand the consequences of laws and regulations through our web applications. We are currently focused on streamlining the regulatory review of residential construction projects in the US. Unfortunately, data about properties is hard to access, and this limits our ability to scale Complaw® web services for the public and local governments.

The public's access to property data is currently mired with bottlenecks.

  1. Data about properties is not available, not easily accessible, and/or expensive to access ( including basic information about property assessments).
  2. A holistic view of a property may only be formed by synthesizing information from multiple sources, with different file formats, vocabulary, and granularity of information.
  3. Unlike Google and Amazon, access to information about a property, when available, is gated by the need to specify an identifier for the property, e.g, address or a tax identifier such as APN.
  4. Data obtained from official sources can, often, conflict and there isn't an easy way for users to either understand the provenance of the property information displayed or to request changes.

To alleviate the above bottlenecks, and to make it easy and engaging for the public to access and search property-related information, Symbium is building the first of its kind publicly accessible and free-to-use property information portal for the US.

A big part of this ambitious project involves setting up a sustainable data pipeline for different states and municipalities. We are looking for software and data engineering interns to help us bridge this information gap between the public and government.

As an intern, you would work with our software engineers and product managers to (1) expand the coverage and quality of the data that powers our Complaw® applications and (2) set up a sustainable data ingestion and maintenance pipeline for property related data from authoritative sources.

Role & Responsibilities

  • Find authoritative sources of structured data across multiple geographies and create scripts to extract this data.
  • Work closely with our software engineers to set up a data transformation pipeline to clean and normalize the collected data.
  • Build internal tools to streamline our ETL processes.

Experience & Qualifications

Who you are:
  • Team player who is passionate about being a part of a tight-knit and nimble team.
  • Have strong scripting languages skills (e.g., NodeJS, Python).
  • Understanding of how to effectively use regular expressions.
  • Knowledge of data parsing and cleaning libraries.
  • Able to meet tight deadlines.
  • Resourceful, and excited about exploring creative ways of solving problems and learning on the job.
  • Understands the significance of good software development practices such as unit tests, profiling, prototypes etc.

Nice to haves:
  • Experience working with geo-spatial data or libraries, e.g., GDAL, Turf.
  • Knowledge of data analysis tools, e.g., pandas or NLP tools.
  • Experience working with large datasets.
  • Background in using technology to solve problems in the regulatory space.
This is a US-based consulting position. Symbium is an equal opportunity employer.