In the Market for Data Science Software? Ask Vendors These Questions Before You Buy.

By Gregg Spivack

Vice President of Client Services, NPI

October 21, 2019

Interested in learning more about NPI’s services?

Contact Us

Data science software is a hot market. How hot? As things stand currently, VCs are racing to support the hypergrowth happening across the vendor landscape, and new AI solutions are cropping up daily to solve problems ranging from inventory forecasting to fraud detection (and everything outside and in between).

Meanwhile, enterprise interest in data science solutions is just starting to percolate. If your business falls in the “interested” category, you’re probably wondering how to navigate the sea of vendors and offerings in what is still a nascent solution category.

In this post, we share a few questions to help you more clearly define your data science software requirements so you can select the right vendor and negotiate effectively. Our focus is primarily on the tools for authoring data science and analytics products rather than specific industry or vertical solutions.

3 Questions For Your Data Science Software Vendor Shortlist

Does your software focus more on prototyping or production?

(Spoiler: It can only be one.)

Every vendor will claim their tool addresses the full data science lifecycle. Meaning, they will claim their tools are for BOTH prototyping and production work.  The data science process is more iterative in nature than traditional computer programming. Meaning planning gets you only so far, and instead, there is a lot of learning in the data that will change or completely redirect the project’s objectives. For that reason, “production” data science work should be separated from prototyping.

Explore further by asking “who is the target user of the solution?” Is it for users of traditional Excel/Alteryx (point and click-like tools)? Is it for coders (R, Python, SAS, Stata)? If the answer is yes to any of these it is likely for prototyping.

If the vendor leads with Git integration, CI/CD (continuous integration/continuous delivery), or if they reference users writing in Java, Python, Scala or similar, odds are that they are targeting “production” use cases.

How can the software integrate with how my analysts and data scientists already do work?

(Spoiler: It better!)

There is a universal truth in IT process management: once you know how to do something one way, there must be a very compelling reason to learn a new process. If your analysts use Excel, Alteryx, Dataiku, Trifacta, Tableau Prep or some other tool to prepare data, you should assume they will never change. For that reason, any software should work alongside these tools with tight integration rather than assume you can sunset the old software.

Take an inventory of your data access, data prep, model building and visualization tools. An example stack might be:

  • Access: Alation
  • Prep: Trifacta/Alteryx/Domino Data Labs
  • Modeling: DataRobot/Domino Data Labs
  • Visualization: Tableau

Then require your vendor to show how a person using those tools would use their solution to augment a workflow in a productivity-enhancing way.

How do your pricing models and quantity discounts align with our budget and deployment requirements?

(Spoiler: Whatever the answer, validate.)

There are several alternative routes for pricing. Most modern, cloud-first suppliers will have some pricing based on the consumption of their solution. A consumption-based pricing model is the best way to ensure the value of the software is aligned with the price the customer will pay. It will also ensure the internal selling of the solution will go a bit more smoothly. After all, if the data science software is not adopted, there will be no big price tag. Modern companies such as Databricks and Snowflake, like the major cloud providers, lead with these “buy-the-drink” pricing structures.

The alternatives to a consumption pricing model are server or user subscription models. In these scenarios, the total price will depend on size/number of servers or number of users, respectively. If this is the case, do some quick back-of-the-envelope calculations based upon the maximum size/number of servers or number of users to make sure the numbers make some sense relative to the business value you anticipate. In some cases, customers are considering a switch from legacy vendors like SAS or Teradata to an upstart software vendor – this “worst case” calculation is also relevant in those scenarios as you compare costs and benefits.