Our project will focus on analyzing the features behind successful apps. Our goal is to help developers and Google build apps that many users will enjoy and increase revenue in the process. We will be analyzing apps on a general basis to obtain key variables responsible for high downloads. We will also group apps based on a combination of variables to obtain the ideal app types.
The data we used is a Google Play Store dataset, containing over 10, 000 observations, and 12 variables describing app characteristics such as reviews, category, genres etc. We modified the dataset to remove variables we thought would not significantly contribute to our analysis such as “last updated” and “Android Version”. We also converted the Type column, referencing an app being Free or Paid, to a binary column to facilitate analysis.
Our methods include a descriptive and clustering analysis. We wanted to understand how our data could be grouped. We used K-means clustering and our data was grouped into 3 different clusters reflecting differences in their installs, reviews, and ratings. We then identified the top categories and price distributions of the apps within those clusters to explain the grouping. We used our findings to create forecasting analysis using K nearest neighbors and regression trees.
After compiling all the information we derived from our analysis, we were able to construct the characteristics of a successful application to provide developers with a framework and tips to build and present a successful application to the Play store.
Introduction
Our project will focus on analyzing the features behind successful apps. Our goal is to
help
developers and Google build apps that
many
users will enjoy and increase revenue in the process. We will be analyzing apps on a general basis to obtain key variables responsible for high downloads. We will
also
group apps based on a combination of variables to obtain the ideal app types.
The data we
used
is a Google Play Store dataset, containing over 10, 000 observations, and 12 variables describing app characteristics such as reviews, category, genres etc. We modified the dataset to remove variables we
thought
would not
significantly
contribute to our
analysis
such as “last updated” and “Android Version”. We
also
converted the Type column, referencing an app being Free or Paid, to a binary column to facilitate analysis.
Our methods include a descriptive and clustering
analysis
. We wanted to understand how our data could
be grouped
. We
used
K-means clustering and our data
was grouped
into 3
different
clusters reflecting differences in their installs, reviews, and ratings.
We
then identified the top categories and price distributions of the apps within those clusters to
explain
the grouping.
We
used
our findings to create forecasting
analysis
using K
nearest
neighbors and regression trees.
After compiling all the information we derived from our
analysis
, we were able to construct the characteristics of a successful application to provide developers with a framework and tips to build and present a successful application to the Play store.
Introduction