by J. Robert Taylor, J.D.
There has been a lot of news about "big data" and real estate these days. Zillow's potential acquisition of Trulia for $3.5 billion is the latest example on the residential side. On the commercial and investment side, over the past few years CoStar group has acquired Loopnet and Apartments.com for a total of over $1 billion.
Typical categories of real estate data are: public records from the county recorder, property tax assessor's records, building permits from the city, multiple listing service data, etc. Who are the players in real estate big data today? There are pure data creators and then there are hybrids that both enable the creation of data as well as harvest data and reformat it for their audience.
First, let's look at the pure data creators. These are individuals or agents who input data into a proprietary or public database. Multiple listing service is a great example. Agents and sellers fill out information in numerical fields designed for the real estate industry (such as price, square footage, lot size, bedrooms, baths, etc.) and descriptive fields showing features (such as the style of the home, flood zone information, details on fixtures, appliances, etc.).
This is also where pictures and videos of the property are uploaded. There is often even space for more extensive and detailed information to be included like property inspections, pest reports, environmental disclosures and other disclosures.
Property owners themselves are pure data creators, using websites like Craigslist, Zillow, Loopnet or Trulia to rent or sell their property. In doing so they enter data and pictures in similar fashions as is done on the multiple listing service. Craigslist is the most egalitarian of the third-party sites, and it lets the owner format data in almost any way they want. Additionally, Craigslist is unique in that it does not resell the data or make its money from online advertising.
Other pure data creators include county recorder's offices that record and track all property sales transactions, including sales price and date of sale. They also record and track all mortgage activities with respect to real property. City and county building departments also store a lot of data with regards to permits, cost of work, use permits, variances, etc. Many government agencies are now putting such information online without restriction.
Second, there are hybrids that include sites such as Zillow, CoStar/Loopnet and Trulia. These companies are primarily harvesting data from the pure data creators. They harvest the data in order to reformat it and then sell the site views and clicks to advertisers on the Web. Alternatively, hybrids sell subscriptions to users who want access to the data. Make no mistake; those are valuable site views and clicks as the vast majority of buyers and sellers are looking at some third-party hybrid in order to get information on the real estate market. These hybrids know that homebuyers and homeowners will go where the data is most accurate, complete and easily obtained. How could Trulia be worth $3.5 billion? It created an easy-to-use platform that engages users and markets site views and clicks to agents and other third parties wanting to reap business from new customers. All businesses thrive on repeat business. If you can get one user to engage multiple times on multiple platforms (think Facebook, Instagram, Twitter, Snapchat, etc.) then the potential for advertising to that valuable user grows significantly. All these hybrids want subscribers where your identity is linked to the hybrid's website, mobile site and social site. More clicking equals more money -- billions apparently. They have the ability to resell your interest in real estate to other related business. A person who buys a new home often buys appliances, home furnishings, moving services and the like. If these businesses can directly market to you through Trulia and Zillow they have an entry point to compete for business from very low-hanging fruit.
Let's not forget about the pure data creators. If they suddenly decide to put their data elsewhere and profit from it themselves, then the hybrids, who are now getting the data for pennies or for free, will be left out in the cold. Once they don't have the data they are empty shells, much like the newspaper real estate classified ad sections are today.
How would that impact the real estate market? In my opinion, it would take a week for buyers and sellers to find the location where the pure data creators want to share it. Google finds these things in expert fashion. The hybrid data aggregators take advantage of a highly fragmented and disorganized real estate brokerage community that has been traditionally behind the curve when it comes to tech. Perhaps pure data creators will realize they are sitting on billions and do something about it.