One of the key benefits of using business analytics software like Panintelligence, is to use your data set to create a model and apply that to your incoming or existing data to predict the future behaviour of a customer, a product or a device.
In this illustration, pi Developer – Alex Paramore – feeds in publicly available data about films, to demonstrate how pi's BI reporting tools can be used to produce interesting visualizations and unexpected correlations and ultimately – predict if a movie will be any good or not.
Take a dataset and see if you can use it to predict whether you would like a film or not.
Alex started by collecting data that was openly available from the IMDb. (Alex got the data from Kaggle which provides an open forum for datasets and has all sorts of interesting data for download).
Information such as the movie's run length, date of release, age certification, genre, director, country of origin, language, what platform they are streaming on, as well as rating scores from both IMDb and Rotten Tomatoes, were available.
Alex imported these data into his PI database simply from an excel sheet using the data connection on pi. (This could also easily be done via any database such as Oracle or SQL Server).
To define whether he would like a film or not - Alex chose to assign any film that had a viewer rating in IMDb of 7 or more to be a ‘good’ film and one he would like.
Using the analytics chart he set an objective to find films with scores over 7. Running it through pi – straight away Alex was able to see that of the approximately 16,000 films in his data set, 80% had a rating of less than 7.
In other words, based on the rules he had determined, 80% of these films were not movies he would like. Next, he wanted to find what characteristics the 20% of films he would like (those with scores over 7) shared.
Interestingly, pi revealed that the film’s run time was the most significant factor in whether a film would be rated highly. If the run time was greater than 111 minutes – then 33% of those would be a good film. Digging deeper into the data, Alex was able to see that if the film was over 129 minutes, then the chance of a high rating rose to 40%.
Next, Alex wanted to see what other characteristics of a film were statistically important in its rating. He discovered that the specific streaming platform upon which the film was available, was another highly significant factor. If the film was available to stream on Disney Plus, the chance of the film being rated more than 7 rose to 74%.
So, we now start to see how powerful insights can be gained from even fairly simple data; if a film is over two hours and on Disney Plus – Alex will most probably like the film.
If we increase the granularity by adding the criteria of a director, Alex was able to up the chance of him liking the film to 83%.
As a self-proclaimed data-geek, Alex wanted to split the data out further and create a second model. So next, he split out the data into genres, languages and countries.
Immediately he was able to see from the data that documentaries tended to be higher rated films. If a film was a documentary, it immediately had a 50% chance of scoring over 7. And if it was a documentary about music, it had a 70% chance of scoring over 7.
Going back to his first model, if a film was a documentary on the topic of music and was over 1.5hrs long, there was immediately a very good probability – 81% – that it would be a good film.
But he also showed how you are able to exclude data too; if you’re not interested in documentaries, you can easily filter that out and create a model that restricts the data and excludes all documentaries.
Using this model, you could now take the data for newly released films which are yet to be rated on IMDb. And using Alex’s model, you could now predict whether the film is likely to be good or not, based on its characteristics, with a surprisingly high level of accuracy.
Alex’s movie model is of course a fun example of how we can use data to make predictions. But it illustrates how even a small number of fairly simple data points, when run through powerful business analytics software like pi – can produce very valuable insights for your business, allowing you to make data-driven decisions and increase the likelihood of being able to forecast the future.
This post is based on a webinar by Alex Paramore of Panintelligence – you can watch the whole Webinar here. And if you want to see how pi could help you make valuable business predictions - you can try pi for free here.