What STR Data Exists?
What Short Term Rental (STR) Data exists for collection?
The STR industry has been growing attracting more investors, vendors and desire for data to stay ahead of the increasing competition in the space. But what data exists and can be leveraged to stay ahead of the competition? We have been scraping STR data since 2012 to help build STR products and companies and we can help you understand the data landscape.
Main forms of Data
There are a few main forms of data that can be leveraged in the STR space.
Sourced Data
Sourced data is data that is collected directly from the source. Managers may choose to upload their data to aggregators who show their performance against others in their market.
Sourced data is typically more accurate and complete than other forms of data, but it can be more time-consuming and expensive to collect. There are also biases in who is reporting and a much limited population of those who give their data away. Sourced data is often anonymous in reporting tools and lack features to drill deeper into the individual property level or due to the limited numbers of managers reporting in, have limited ways to pivot the data.
Since sourced data is aggregated the results are typically market level reporting on the headline KPIs such as ADR, Occupancy, RevPAN, available nights and demand comparisons.
Scraped Data
Scrapped data is data that is collected from websites or other online sources such as Airbnb, Vrbo, Booking or specific manager websites. This can be done manually or with the help of software. Scrapped data can be a quick and easy way to get a lot of data and have the widest view of what is out there.
While scraped data offers the widest view of the market and every listing that is out there, it is important to keep in mind the limitations. Since the data is scraped, individual reservation rates or occupancy type (block or booking) must be inferred with AI and machine learning using a healthy amount of sourced reservations to model against. There will be some natural median of error but as long as the median of error is within a reasonable bound, then it can unlock massive advantageous.
Scraped Data - Listing Information
The listing level data scraped offers the highest confidence data since it is all the attributes about the property that are public facing. This data requires no inference or modeling since it is the advertisement of the property.
The data points include:
Currency, Cleaning Fee and Instant Book-ability (available or not, check in, minimum stay by date etc)
Property Name and Description
Zip Code, City, State, Country, Latitude & Longitude (decimals on the latitude and longitude are often dropped to prevent direct locating of the property)
Room type and property type
License required by local municipality and license number (not required to fill in)
Number of bedrooms, bathrooms and occupancy (adults & children), Pets
Amenities (This can be a 100+ list long from kitchen utensils to a fire pit) big amenities such as washer/dryer, Wifi, kitchen, AC & Heat are always included
Location Features: Pool, Hot Tub, Waterfront, Beach access or frontage, Ski in or out
Primary Photo, photo URLs and number of photos
Security Deposit required or not
Guest Review Count, Score by Category
Scraped Data - Manager Information
The scraped data also offers information on the manager of the property itself. When choosing a property, the management of the property is very important to guests and their level of service they will receive.
These data points include:
Manager Name and Hosting Start
Manager Response Time and and Rate
Superhost, preferred partner, etc
Language Spoken
Scraped Data - Calendar Information
The calendar is one of the most important features to scrape. While the listing gives you information for which you can aggregate, filter or combine your data the calendar information are the insights that lead to performance metrics and revenue.
These data points by calendar date include:
The Available or Not
Price or Rate
Minimum Stay
Check in permitted or check out permitted
With calendar data, you cannot just scrape once to be able to gather all the data you need for revenue and performance data. For calendar data you will need to scrape every listing on a daily basis in order to identify changes to the calendar.
This very complex process is where having a strong data scraping company to support you will save hundreds of thousands in staffing and server costs and a year or more for a useful scraper. This is where Hungry Robots can help. Additional information on our calendar scraping data can be found here.
Scraped Data - Calculated Values
Since the scrape data for calendars is very limited and raw, the most important information is calculated on top of that. Tracking the change in price, when dates are now longer available and which dates became unavailable with that will help you to build a robust view into STR performance.
To calculate revenue, occupancy, RevPAN and more you will need to have a very solid booked and block model to infer which dates where booked as a reservation and which were blocked due to owner holds or maintenance. Hungry Robots uses sourced reservations on over 250,000 listings to model the probability of an unavailability being either a booking or a block. With that information, Hungry Robots data can be leveraged for occupancy, revenue and pacing information. If you want to learn more about how Hungry Robots models this, schedule time with an expert!
Leveraging an accurate booked and blocked model with booking date, the below calculations can be done:
Occupancy, RevPan, Rent, Revenue
Length of Stay (LOS)
Pacing on Occupancy, RevPan, Rent, Revenue and LOS
Booking windows, changing to booking behaviors and estimated remaining demand
Growth, Trends, Same Store Sales (SSS) performance
Probability of booking models
The possibilities are really endless...
To ensure the market is not being double or triple counted, another modeled data Hungry Robots provides in the data feed is our Crosswalk file where we map listings across multiple distribution channels together. When measuring a markets performance it is important to adjust or remove duplicates to avoid double or triple counting. However, if you are building a pricing model you may want to intentionally overweight multi-channel listings and in which case leaving them in will do that for you.
With a good crosswalk file you can identify units or homeowners only on one channel, channel importance by market, premiums or differing booking behaviors across channel or upside opportunity in a target acquisition. The channels can leverage to allocate marketing dollars for where they have not fully penetrated the inventory in that market. The crosswalk files help you identify where the TAM is being marketed, how much is offline and more!
Leveraging an accurate crosswalk file the below calculations can be done:
Manager Inventory Count, Inventory Makeup & Locations
Manager Rent, Growth Rate/Churn, Reviews etc
Manager Revenue Management sophistication (Rate schedule, channel distribution, rate parity etc)
Manager Performance against Market or YoY
Agency or Government Statistics
The last form of STR data that can be collected is from Destination Marketing Organizations (DMO), tourism boards or federal government bodies such as the US Census. These different organizations collect data, tax receipts and census information to provide data at the block group, city or state level and can offer a healthy history of data.
For example the US Census will publish how many seasonally vacant homes there are in different jurisdictions to help provide a larger TAM number than are on channels like Airbnb, Vrbo etc. Not all seasonally vacant properties are made available for rent, and of those that are not all of them are listed on each channel. While Hungry Robots offers a robust crosswalk file to get the total market, the US Census figure provides a good upper bounds on inventory available.
Another example is the Tourism Board of Hawaii which will share flight arrivals by country and to which county they are flying. Since Hawaii is a fly to destination where all tourists must pass through government surveys there is a healthy amount of tourism data collected and shared.
Uses of STR Data
While the uses of STR data are technically limitless we have put together a few examples below.
Market research
Pricing
Marketing
Property management
Trends Reporting
Forecasting
We have a full article here on what can be built and leveraged off of the STR data for more information.
If this article has you excited to get your hands on STR data, Hungry Robots is happy to help. Click here to schedule time with an expert to learn more about how Hungry Robots can help your STR data needs.