How we learned to love firmographics, say no to the social firehose, and sell insights … not data

As data geeks know, the volume and velocity of content generated on social networks and media sites is producing a wealth of information.  Marketers and sales teams are finally learning how to make sense of this deluge which has significantly increased the demand for the data.  Companies such as Datasift, Apple Inc.’s Topsy, Twitter’s Gnip and NTT Data have emerged as intermediaries between the data sources (e.g. Facebook, Twitter, LinkedIn…) and the sales and marketing folks who use the data.

In the past few weeks, the relative calm in this industry has been upset by a number of significant changes, most notably, Twitter’s decision to not renew DataSift’s access to their data firehose.  While many people in the industry anticipated this, it is symbolic of a weakness in the social media data industry’s model and strategy.  What warrants the concern? Each of these intermediary data providers has independently negotiated licensing agreements with Facebook, Twitter and others.  More often than not, terms are unfavorable and ultimately expose intermediaries to the whim and discretion of the social networks.  Risky business.

Moreover, it is no secret that Facebook, Twitter and LinkedIn are working to improve the value and digestibility of their user generated data through better data structures and internal analytics.  Facebook, for example, just launched Topic Data, which is a keyword-based, anonymized data set and self-serve tool.  It’s aimed at giving marketers deeper insights into themes and topics that individuals are engaging.  Ultimately, initiatives like Topic Data are just the tip of the iceberg.  As more projects like this come online, they will erode the value that intermediaries can offer.

At Batavia Research, this recent turbulence has reaffirmed our approach to data collection and data delivery.  When we started working on our technology in 2013, the industry was changing quickly.  As the LinkedIn co-founder, Reid Hoffman, said, “…in a changing world, playing it safe is one of the riskiest things you can do.”  We knew we wanted to do things differently and not expose ourselves to the surreptitious discretion of Facebook or Twitter just to access their data in the short run … without long-term guarantees.

The major difference between Batavia Research and other data intermediaries can be highlighted by three paradigm shifts. These three differences have ultimately insulated us from the industry’s recent turbulence.
  1. company-centric instead of people-centric

One of Batavia Research’s earliest decisions was to focus on profiling companies and not individuals.  This gave us a competitive advantage on a few levels.  First we didn’t have to compete in the personal or consumer data space which has quickly becoming oversaturated.  Second, profiling companies was not limited by the same privacy regulations and terms of use restrictions that are associated with personal data.

  1. profile companies based on their digital finger print… not the social firehose

Most social media data providers profile an individual or company by collecting enormous amounts of data from the social firehose.  To add context to this data, it is then attributed to a user handle or resolved into a topic or theme.  While this approach helps add some context, it is limited to attribution only within the social network that the data is derived from.

As an example, DataSift’s data might indicate that a status update on twitter was posted by the user @hoovers but I don’t know what real commercial entity @hoovers is.  Is it Hoovers Vacuums, Dun & Bradstreet’s Hoovers or Hoover’s Subs Sandwiches in Valier, Pennsylvania?

Batavia Research addresses the attribution problem by not looking to the firehose for context. Instead, we programmatically map companies to their social handles and profiles at channels such as Facebook, Twitter, Google+, LinkedIn and over 180 others.  This map is referred to as a company’s “digital finger print.”  Once social media data is collected, the digital finger prints are used as a context overlay to add attribution to commercial entities.

  1. diversified data sources instead of a few firehoses

When characterizing a company, Batavia Research considers their entire online presence and activity.  This requires a diverse set of data sources that goes beyond just looking at social media sites.  Not only does this approach significantly mitigate the risk associated with relying on only a select few firehoses, it improves the data quality and depth.  In fact, Batavia Research collects over 350 company-centric attributes.  These diverse sources go beyond social networks and media sites.  They include easy payment tools (e.g. Square), blogs, review sites, company websites and even domain registrars.

About Batavia Research

Batavia Research ( collects digital firmographic data that enables a range of solutions for business information providers, the financial services industry and sales & marketing.  This is achieved via BR’s proprietary technology that discovers, gathers and characterizes a company’s digital presence and marketing activities. Underpinning BR is a highly automated, scalable platform that evaluates the digital presence and marketing activities of companies and brands. A language agnostic approach provides global reach and insights. Data is updated weekly and transformed into numerous metrics, historic dimensions, and various proxies and indices. Partners of BR have access to their data via import/export or API data streaming capabilities. Connect with Batavia Research on LinkedIn and Twitter.

By David van der Roest