10 Advanced CSV Import Features You (Probably) Won’t Launch Yourself
CSV import may on the surface seem simple to build, but launching an advanced importer is a huge undertaking requiring substantial product and engineering resources.
While CSV import may on the surface seem simple to build, there’s a long list of features your customers will request when they get stuck on imports. We frequently see startups think CSV import will be launched after a quick sprint, but get bogged down by the never-ending critical feature requests.
“Two years after we started building CSV import, we prioritized our 4th engineering project to add improvements” said Rohan Sahai, Director of Engineering at Affinity. “When we brought the proposal to leadership, they insisted this be the last time they ever see CSV import on the roadmap. If there are two features we regret homerolling, the first is subscription billing and the second is CSV import.”
For rapidly growing startups with many priorities, the never-ending CSV import project takes away valuable engineering time that could be spent focusing on the core product. From CSV parsing bugs to intuitive error messaging, here are the top 10 features that will keep your engineering team working on CSV import for months (or even years) longer than expected.
From enforcing basic type validations to complex conditional logic, there are endless ways to validate the data you allow into your system. You’ll likely implement the basics, but rogue whitespace, emojis, or missing leading zeros create bugs and customer complaints down the line.
At OneSchema, we’ve compiled the largest library of data validations available, designed for all industries. To implement CSV import, your team will simply need to leverage our no-code data types library to get started. The alternative is to write custom validations for each data type you’ll support now, and continue to add to the list indefinitely in the future for all new types.
Even after implementing robust validations for each data type you need, an additional layer of logic is needed to convert between formats. At OneSchema, we’ve built the platform to automatically convert between any date, number, country code, phone number format, and more. OneSchema can even intelligently infer the original format of a column of data when ambiguous (such as MM/DD/YYYY dates vs DD/MM/YYYY dates).
We’re continuously testing our validators against the world’s messiest CSV data across our entire customer base. There are constantly new edge cases we catch as part of this maintenance process – trust us when we say the possible list of errors to validate against is never-ending!
CSV parsing will be an endless source of bugs, even if you leverage a robust parsing library. Most CSV parsers do not handle different encodings gracefully out of the box and in most cases, your customer will not be familiar with the encoding of their exported CSV file.
To ensure that every CSV file imports properly, OneSchema has byte order mark (or BOM detection) to intelligently switch between parsing strategies for UTF-8, UTF-16, and other Unicode encodings. Also, thinking about supporting Excel? Get ready to gracefully handle legacy Excel formats and implement different parsing strategies for .xls and .xlsx.
In addition, most open-source libraries have incomplete support for handling edge-cases like:
Failures in the parsing step to handle these edge cases almost always make data import impossible, as all future steps rely on a successful parse.
OneSchema gracefully handles these edge cases and more. For example, the platform intelligently identifies the row most likely to be the header row to remove as much friction as possible throughout the mapping step.
When your customer’s upload inevitably has errors that need to be corrected, it’s likely they’ll need to collaborate with other members of their team to resolve the issues. A naive importer will display a list of cryptic error messages. A more advanced implementation might provide clean and helpful error messages, but with an inadequate error summary, the back-and-forth to complete the import still becomes incredibly difficult.
To provide an exceptionally smooth import experience, OneSchema goes a step further and allows users to export an Excel file with highlights and annotations that explain the issues to your customer (video).
The #1 challenge we see with custom-built CSV importers is the lack of an error resolution step. Users trying to map their data to the schema are left with cryptic error messages (or in many cases, no communication at all) as to why their file didn’t upload.
This can create an incredibly frustrating experience for customers, as they’ll either have to iterate on getting the data correct in Excel (which will often reformat their data and create more errors) or escalate a support ticket to your team.
We’ve found that adding the below features drastically reduces customer frustration:
While building a robust error resolution process can be a daunting task from both an engineering and design perspective, you’ll save your customer the round-trip to Excel to clean up their data and drastically improve your onboarding conversion rates. If your CSV import flow is part of customer onboarding, launching with an understandable error resolution step is absolutely critical or you’ll see a substantial drop-off in activation rates. Across a survey of our customer base, we’ve seen a 50+% increase in completed activations with the addition of the error resolution step.
{{blog-content-cta}}
Performance is a huge concern with data imports. Customer spreadsheets can vary drastically in the number of rows and columns, and every additional second of load time can hugely impact conversion rates. Naive approaches will start with loading, validating, and visualizing all of the spreadsheet data at once on the browser.
As spreadsheets get to thousands or millions of rows, however, the complexity scales drastically:
OneSchema is the only available product that supports 1M+ row files in a performant way. Our backend uses an in-memory Rust data service that runs on high-memory servers to load each uploaded spreadsheet into memory. This in-memory approach ensures that all of our data type validations and transformations happen in real-time, regardless of the spreadsheet size.
When the spreadsheet data is ready to import, OneSchema is resilient to outages, as our asynchronous jobs send the spreadsheet data in batches of JSON records to your importer webhook endpoint with exponential backoff on retries. Finally, our embedded spreadsheet frontend virtualizes rows as the user scrolls so the interface remains fast and snappy whether the spreadsheet has hundreds of rows or millions of rows.
Column mapping is another common stage where CSV imports go awry. If your customer maps their columns incorrectly, they’ll either need to contact your team to help undo imports (resulting in a time-consuming task for your engineering team) or get frustrated and abandon the process. Implementing a preview pane to let your customers see the mappings (and errors) will help your customers get their data right the first time.
If your imports have a large number of columns, it’s also slow and tedious for your customer to map the columns correctly. Supporting intelligent fuzzy matching or remembering mappings will give your customer a far smoother import. Our intelligent mapping algorithm has been trained on millions of rows of mapping data to recommend the correct match, every time.
OneSchema also allows you to surface helpful column descriptions and sample data to your customers. Allowing your customers to split and merge columns like first name, last name, and address can save your customers valuable time during the import process.
For highly customizable systems (for example, Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems), you’ll often allow your users to add custom columns within the platform. To support this, you’ll need to be able to initialize a new importer schema for each customer with dynamic validations based on the customer’s configuration.
You may even want to allow customers to create new columns in your system during import. Implementing a UI to empower your customers to add custom columns in a straightforward way during import can be a complex design and engineering project. The feature becomes even more complex if you want to validate data in the newly created columns during import (for example, validate that the column is a NUMBER or COUNTRY CODE type).
OneSchema is the only platform that allows you to validate custom columns by leveraging our large data type validation library.
Every customer onboarding flow needs good reporting and analytics. Given the error-prone nature of doing a data import, many companies point to their data import workflow as the largest source of onboarding support tickets and bug reports.
In order to properly support these customers, customer success and support teams will frequently ask for dashboards to empower the team in seeing what went wrong with an upload. An upload report dashboard enables a support team member to quickly look at the historical list of imports, download the original CSV file, and compare it against the uploaded data to diagnose the data issue at the root of any customer ticket.
Even worse than a support ticket is when bad data makes it into the system. In those cases, customers may request a data import “undo”. In the most severe cases, this can force your team to create a fresh instance with all previously stored data deleted to solve the issue. Showing upload reports greatly improves your internal teams’ ability to help customers understand their own data and enables better scalability of user onboarding.
Depending on your use case, you may want to treat errors and warnings differently during the upload experience. This will require your team to implement custom logic for handling “error” cases vs “warnings” cases. For example, you may want to allow files that actively contain warnings to successfully upload, but not errors.
Flagging only true errors as errors will save your customer a lot of headaches of getting their data to be perfect before being able to upload any data.
Even with a state-of-the-art importer, it’s often helpful for your customer to initially configure their in an Excel spreadsheet. In Excel, however, your customer won’t have any context on your requirements for their imported data (and the errors that will be triggered when they try to import data into your system).
OneSchema enables you to create a self-validating Excel template based on your data model in OneSchema (3 minute video). You can keep it synced with your importer template and allow your customers to download the template from your product’s UI. This makes it even more likely your customer will have a smooth and successful import.
In an ideal world, time and cost aren’t an issue. Product managers, designers, and developers would be able to prioritize every advanced CSV import feature. However, launching an advanced CSV importer is a huge undertaking requiring substantial product and engineering resources. The unfortunate reality is that product teams skip key features, only to be forced to add them back to the roadmap after an onslaught of customer feedback. Months or even years later, they’re still making improvements to CSV import.
If you’re ready to make data imports effortless and efficient, we’d love to talk! Request a demo of OneSchema here or email us at sales@oneschema.co.
While CSV import may on the surface seem simple to build, there’s a long list of features your customers will request when they get stuck on imports. We frequently see startups think CSV import will be launched after a quick sprint, but get bogged down by the never-ending critical feature requests.
“Two years after we started building CSV import, we prioritized our 4th engineering project to add improvements” said Rohan Sahai, Director of Engineering at Affinity. “When we brought the proposal to leadership, they insisted this be the last time they ever see CSV import on the roadmap. If there are two features we regret homerolling, the first is subscription billing and the second is CSV import.”
For rapidly growing startups with many priorities, the never-ending CSV import project takes away valuable engineering time that could be spent focusing on the core product. From CSV parsing bugs to intuitive error messaging, here are the top 10 features that will keep your engineering team working on CSV import for months (or even years) longer than expected.
From enforcing basic type validations to complex conditional logic, there are endless ways to validate the data you allow into your system. You’ll likely implement the basics, but rogue whitespace, emojis, or missing leading zeros create bugs and customer complaints down the line.
At OneSchema, we’ve compiled the largest library of data validations available, designed for all industries. To implement CSV import, your team will simply need to leverage our no-code data types library to get started. The alternative is to write custom validations for each data type you’ll support now, and continue to add to the list indefinitely in the future for all new types.
Even after implementing robust validations for each data type you need, an additional layer of logic is needed to convert between formats. At OneSchema, we’ve built the platform to automatically convert between any date, number, country code, phone number format, and more. OneSchema can even intelligently infer the original format of a column of data when ambiguous (such as MM/DD/YYYY dates vs DD/MM/YYYY dates).
We’re continuously testing our validators against the world’s messiest CSV data across our entire customer base. There are constantly new edge cases we catch as part of this maintenance process – trust us when we say the possible list of errors to validate against is never-ending!
CSV parsing will be an endless source of bugs, even if you leverage a robust parsing library. Most CSV parsers do not handle different encodings gracefully out of the box and in most cases, your customer will not be familiar with the encoding of their exported CSV file.
To ensure that every CSV file imports properly, OneSchema has byte order mark (or BOM detection) to intelligently switch between parsing strategies for UTF-8, UTF-16, and other Unicode encodings. Also, thinking about supporting Excel? Get ready to gracefully handle legacy Excel formats and implement different parsing strategies for .xls and .xlsx.
In addition, most open-source libraries have incomplete support for handling edge-cases like:
Failures in the parsing step to handle these edge cases almost always make data import impossible, as all future steps rely on a successful parse.
OneSchema gracefully handles these edge cases and more. For example, the platform intelligently identifies the row most likely to be the header row to remove as much friction as possible throughout the mapping step.
When your customer’s upload inevitably has errors that need to be corrected, it’s likely they’ll need to collaborate with other members of their team to resolve the issues. A naive importer will display a list of cryptic error messages. A more advanced implementation might provide clean and helpful error messages, but with an inadequate error summary, the back-and-forth to complete the import still becomes incredibly difficult.
To provide an exceptionally smooth import experience, OneSchema goes a step further and allows users to export an Excel file with highlights and annotations that explain the issues to your customer (video).
The #1 challenge we see with custom-built CSV importers is the lack of an error resolution step. Users trying to map their data to the schema are left with cryptic error messages (or in many cases, no communication at all) as to why their file didn’t upload.
This can create an incredibly frustrating experience for customers, as they’ll either have to iterate on getting the data correct in Excel (which will often reformat their data and create more errors) or escalate a support ticket to your team.
We’ve found that adding the below features drastically reduces customer frustration:
While building a robust error resolution process can be a daunting task from both an engineering and design perspective, you’ll save your customer the round-trip to Excel to clean up their data and drastically improve your onboarding conversion rates. If your CSV import flow is part of customer onboarding, launching with an understandable error resolution step is absolutely critical or you’ll see a substantial drop-off in activation rates. Across a survey of our customer base, we’ve seen a 50+% increase in completed activations with the addition of the error resolution step.
{{blog-content-cta}}
Performance is a huge concern with data imports. Customer spreadsheets can vary drastically in the number of rows and columns, and every additional second of load time can hugely impact conversion rates. Naive approaches will start with loading, validating, and visualizing all of the spreadsheet data at once on the browser.
As spreadsheets get to thousands or millions of rows, however, the complexity scales drastically:
OneSchema is the only available product that supports 1M+ row files in a performant way. Our backend uses an in-memory Rust data service that runs on high-memory servers to load each uploaded spreadsheet into memory. This in-memory approach ensures that all of our data type validations and transformations happen in real-time, regardless of the spreadsheet size.
When the spreadsheet data is ready to import, OneSchema is resilient to outages, as our asynchronous jobs send the spreadsheet data in batches of JSON records to your importer webhook endpoint with exponential backoff on retries. Finally, our embedded spreadsheet frontend virtualizes rows as the user scrolls so the interface remains fast and snappy whether the spreadsheet has hundreds of rows or millions of rows.
Column mapping is another common stage where CSV imports go awry. If your customer maps their columns incorrectly, they’ll either need to contact your team to help undo imports (resulting in a time-consuming task for your engineering team) or get frustrated and abandon the process. Implementing a preview pane to let your customers see the mappings (and errors) will help your customers get their data right the first time.
If your imports have a large number of columns, it’s also slow and tedious for your customer to map the columns correctly. Supporting intelligent fuzzy matching or remembering mappings will give your customer a far smoother import. Our intelligent mapping algorithm has been trained on millions of rows of mapping data to recommend the correct match, every time.
OneSchema also allows you to surface helpful column descriptions and sample data to your customers. Allowing your customers to split and merge columns like first name, last name, and address can save your customers valuable time during the import process.
For highly customizable systems (for example, Customer Relationship Management (CRM) and Enterprise Resource Planning (ERP) systems), you’ll often allow your users to add custom columns within the platform. To support this, you’ll need to be able to initialize a new importer schema for each customer with dynamic validations based on the customer’s configuration.
You may even want to allow customers to create new columns in your system during import. Implementing a UI to empower your customers to add custom columns in a straightforward way during import can be a complex design and engineering project. The feature becomes even more complex if you want to validate data in the newly created columns during import (for example, validate that the column is a NUMBER or COUNTRY CODE type).
OneSchema is the only platform that allows you to validate custom columns by leveraging our large data type validation library.
Every customer onboarding flow needs good reporting and analytics. Given the error-prone nature of doing a data import, many companies point to their data import workflow as the largest source of onboarding support tickets and bug reports.
In order to properly support these customers, customer success and support teams will frequently ask for dashboards to empower the team in seeing what went wrong with an upload. An upload report dashboard enables a support team member to quickly look at the historical list of imports, download the original CSV file, and compare it against the uploaded data to diagnose the data issue at the root of any customer ticket.
Even worse than a support ticket is when bad data makes it into the system. In those cases, customers may request a data import “undo”. In the most severe cases, this can force your team to create a fresh instance with all previously stored data deleted to solve the issue. Showing upload reports greatly improves your internal teams’ ability to help customers understand their own data and enables better scalability of user onboarding.
Depending on your use case, you may want to treat errors and warnings differently during the upload experience. This will require your team to implement custom logic for handling “error” cases vs “warnings” cases. For example, you may want to allow files that actively contain warnings to successfully upload, but not errors.
Flagging only true errors as errors will save your customer a lot of headaches of getting their data to be perfect before being able to upload any data.
Even with a state-of-the-art importer, it’s often helpful for your customer to initially configure their in an Excel spreadsheet. In Excel, however, your customer won’t have any context on your requirements for their imported data (and the errors that will be triggered when they try to import data into your system).
OneSchema enables you to create a self-validating Excel template based on your data model in OneSchema (3 minute video). You can keep it synced with your importer template and allow your customers to download the template from your product’s UI. This makes it even more likely your customer will have a smooth and successful import.
In an ideal world, time and cost aren’t an issue. Product managers, designers, and developers would be able to prioritize every advanced CSV import feature. However, launching an advanced CSV importer is a huge undertaking requiring substantial product and engineering resources. The unfortunate reality is that product teams skip key features, only to be forced to add them back to the roadmap after an onslaught of customer feedback. Months or even years later, they’re still making improvements to CSV import.
If you’re ready to make data imports effortless and efficient, we’d love to talk! Request a demo of OneSchema here or email us at sales@oneschema.co.