Data Collection

How do you collect data that's representative of each resale market?

Importance of Data

Barker's proprietary models used for valuation are supervised machine learning models. These models learn the relationship between input features and the model target, which in these cases will be the relevant assets’ historical transaction prices, and characteristics of those assets relevant to their valuation.

These relationships are learned through historical transactions of product sales. Therefore, the effectiveness of Barker's models is highly dependent on the quality and comprehensiveness of the input data. For this reason, Barker has made data veracity, completeness, and consistency of utmost importance. To this end, our team has combined experience exceeding thirty years in data engineering, data science, data partnerships, and data-as-a-service products.

Data Collection Methodologies

Barker collects, standardizes, and featurizes historical resale transaction data. Barker has productionized a data collection system, which collects new transactions on a daily basis.

This system automatically searches for new sale events and collects all the associated transactions. In order to reliably clean and featurize transaction data, Barker has developed data pipelines. These data pipelines process transaction data from online sources to a standardized featurized format, which can be consumed by our machine learning systems.

Data Partnerships

Barker partners with marketplaces and offers services white labeled, Barker also has data partnership agreements with these marketplaces. This allows for faster access to cleaned and standardized data, and bespoke pricing, as Barker scales to offer services across different resale asset classes.

Software Development Best Practices

Each data source added to Barker's systems requires a data pipeline to standardize it. In order to ensure consistency between data sources, and to accelerate development time, Barker focuses on software engineering best practices namely, generalizability, scalability, and extensibility of the codebase and infrastructure. This is achieved through peer review, test-driven development, CI/CD, and cloud hosting.

Last updated