1. Introduction

For this project we take aim at a very volatile, and less known financial industry—in-game items. Currently, there are numerous video game platforms that provide a marketplace where players can buy/sell in-game items that hold real monetary value. The goal of our project is to extract and build a dataset from marketplace data for an immensely popular game, Counter Strike: GO, subsequently analyze the data collected using statistical analysis techniques and discover trends and correlations in the dataset that could potentially result in net profit.

1.1 Motivation

Similar to the stock market, items for sale on these in-game marketplaces are dynamically set based on various factors (e.g., volume, min buy price, min sell price, rarity, etc.). As a result, their is the potential to make actual financial gains by participating in this market. To put forth a brief explanation, these items are part of a market which hold real monetary value with some virtual items costing thousands of USD.

With the popularity of esports and gaming increasing exponentially globally, we can expect in-game purchases to increase. With an abundant amount of data and minor research into in-game markets, we believe creating the first publicly available in-game market dataset will benefit the gaming and finance community. In addition to creating a new dataset, gaining insight into such marketplaces would provide leverage in designing models to predict item values, determining when to buy/sell, or manage in-game item portfolios as an alternative source of income.

Image of the steam community market and most expensive items.

2. Data

As no datasets were available on virtual items, information about the data and data gathering is introduced in this section.

2.1 Data Gathering Process

With no publicly available dataset, we created the first dataset using NordVPN and the steam market API endpoint to fetch the information on all of the items for a specific game. We picked Counter Strike: Global Offensive due to the high popularity, strong competitive community, and stable prices. As the API limits the number of requests, we had to use NordVPN to automatically switch servers upon hitting the threshold and fetch a total number of 800MB+ of data.

2.2 Data Set

Currently, there are no publicly available datasets containing information regarding the steam marketplace. Therefore, we were required to create our own dataset to gather the relevant information for our analysis. We extracted our data directly from the steam community Marketplace filtered for the items for the specific game we are performing our analysis on. To accomplish this, we utilized the Beautiful Soup API, a python library for extracting data from HTML and XML files. MongoDB was used, to store the files extracted in a cloud database. An example of a record collected and each feature extracted for that specific item can be seen below.

SIZE INFORMATION:

Total number of records: 9400
Total size of dataset: 690.5 MB
Number of features/record: 12

Example of an item in the database.

3. Methods For Data Analysis

Since cases can return one of the possible items in the prize list, we attempt to forecast price history based on the price history of prize items and the probability of winning such items.

First, we begin by fetching the price history for the case and a list of prizes one can obtain from opening the case. Next, we look up the price history for each prize item; however, since each prize item has multiple variations, we collect the price history for each one (e.g., FAMAS | Crypsis (Factory New), FAMAS | Crypsis (Slightly Used), etc.). Next, we adjust the price history for the prize items based on the probability of receiving the item from the case. We then perform an inner join on all items' and case's price history on the timestamp index to remove invalid data.

4. Results

We reserve this section for discussion on the results of some questions obtained from our dataset. This section is not inclusive of all attempts and all details present the full document

4.1 Is there a correlation of price and volume(of sales) of crates?

We attempted to predict a correlation between the price and the items sold. This could be valuable information if we treat cases as investments and work to generate a net profit from buying and selling cases.

Examples and correlation of price to volume of sales.

4.2 Is there a correlation on the prices of various crates and the items they contain?

We attempt to find any correlation between items’ price history that you can win from a case and the case’s price history. Our results show a high correlation between different items and similarities in price behavior. This could allow us to find any items that can be used to predict values of cases.

Correlation of crates and item price within crates.

4.2 Is there a type of item that is traded more than others (weapons, keys, cases, etc.)?

By finding items that are traded more frequently, we can build models to focus on items that trade frequently in order to finalize trades faster.

Visualization of commonly-traded/sold items.

5. Summary

This work provides insight on the steam market as the first dataset of a steam-based economy, and the first use of this dataset. We provide more detail on observations and other factors which at the end, may be profitable to an investor with an open mind.