The Web unpacked: a quantitative analysis of global Web usage

Read original: arXiv:2404.17095 - Published 4/29/2024 by Henrique S. Xavier
Total Score

0

🛠️

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper presents a comprehensive analysis of global web usage patterns based on data from SimilarWeb, a leading source for estimating web traffic.
  • The researchers leveraged a dataset of over 250,000 websites to estimate total web traffic and investigate its distribution among domains and industry sectors.
  • The analysis focuses on the characteristics of the top 116 domains, which comprise an estimated one-third of all web traffic.

Plain English Explanation

This paper takes an in-depth look at how people use the web around the world. The researchers used data from SimilarWeb, a company that tracks web traffic, to analyze the browsing habits of millions of internet users.

They looked at over 250,000 websites and estimated the total amount of web traffic, as well as how that traffic is spread out across different websites and industries. The researchers paid special attention to the 116 most popular websites, which together account for about one-third of all web traffic.

They examined various aspects of these top websites, such as the types of content they host, whether users need to log in to access them, if they have a physical presence in the real world, and who owns them. The analysis revealed some interesting patterns:

[Concentration of traffic] A small number of websites capture the majority of web traffic, indicating a significant concentration of usage. [Top traffic drivers] The websites that attract the most traffic tend to be search engines, news and media sites, social networks, streaming platforms, and adult content sites. [Platform dominance] Much of the web traffic goes to large platforms and websites owned by companies in the United States. [Business models] Most of the top websites are free to use and supported by business models that don't involve charging users directly, such as advertising.

Technical Explanation

The researchers leveraged a comprehensive dataset from SimilarWeb comprising over 250,000 websites to conduct their analysis. They estimated the total web traffic and investigated its distribution across different domains and industry sectors.

The core focus of the paper was a detailed examination of the top 116 domains, which were found to account for approximately one-third of all web traffic. The researchers analyzed various attributes of these high-traffic websites, including:

  • Content sources and types: The researchers categorized the websites based on the primary content or services they provide, such as search engines, news and media, social networks, streaming platforms, and adult content.
  • Access requirements: Some websites require users to log in or create an account, while others are freely accessible.
  • Offline presence: The analysis also considered whether the websites had a physical, offline component to their operations.
  • Ownership features: The researchers investigated the ownership structures of the top websites, looking at factors like the country of origin and whether they were owned by for-profit or non-profit entities.

The key findings of the analysis reveal a significant concentration of web traffic, with a small number of top websites capturing the majority of visits. Search engines, news and media, social networks, streaming, and adult content were identified as the primary drivers of web traffic. The data also suggests a dominance of large platforms and USA-owned websites in terms of overall traffic.

Interestingly, the researchers found that much of the traffic goes to for-profit but mostly free-of-charge websites, highlighting the prevalence of business models that do not rely on paywalls or direct user fees.

Critical Analysis

The paper provides a comprehensive and insightful analysis of global web usage patterns, leveraging a robust dataset from SimilarWeb. However, there are a few potential limitations and areas for further research that could be considered:

[Geographical representation] While the dataset encompasses a large number of websites, it may not fully represent the web usage patterns in all regions of the world, particularly in developing countries where internet access and usage patterns may differ significantly.

[Dynamic nature of the web] The web is a constantly evolving ecosystem, and the analysis in this paper reflects a snapshot in time. Longitudinal studies, such as those found in Improved Methodology for Longitudinal Web Analytics Using Common and Curious Rhythms: Temporal Regularities in Wikipedia Consumption, could provide valuable insights into how web usage patterns change over time.

[User-level analysis] The current analysis is focused on aggregate, domain-level data. Incorporating user-level data, as explored in The Political Economy of Link-Based Web Search, could shed light on individual browsing behaviors and preferences.

[Assessing causality] While the paper identifies various trends and correlations, it would be valuable to investigate the underlying causal mechanisms that drive the observed web usage patterns, possibly through studies like Through the Lens of Google Crux: Dissecting Web Browsing.

Overall, this paper offers a comprehensive and insightful snapshot of global web usage patterns, setting the stage for further research to deepen our understanding of this complex and dynamic ecosystem.

Conclusion

This paper presents a detailed analysis of global web usage patterns, leveraging a large dataset from SimilarWeb to uncover key insights about how people interact with the internet. The researchers found a significant concentration of web traffic, with a small number of top websites capturing the majority of visits. Search engines, news and media, social networks, streaming platforms, and adult content were identified as the primary attractors of web traffic, which was also highly concentrated on large platforms and USA-owned websites.

Importantly, the analysis revealed that much of the web traffic goes to for-profit but mostly free-of-charge websites, highlighting the dominance of business models that do not rely on paywalls or direct user fees. These findings have important implications for understanding the dynamics of the web and the role of various content providers and platforms in shaping online behavior and user experiences.

While the paper offers a comprehensive snapshot of global web usage, further research is needed to address potential limitations, such as geographical representation and the dynamic nature of the web. Incorporating user-level data and investigating causal mechanisms could also deepen our understanding of these complex patterns.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛠️

Total Score

0

The Web unpacked: a quantitative analysis of global Web usage

Henrique S. Xavier

This paper presents a comprehensive analysis of global web usage patterns based on data from SimilarWeb, a leading source for estimating web traffic. Leveraging a dataset comprising over 250,000 websites, we estimate the total web traffic and investigate its distribution among domains and industry sectors. We detail the characteristics of the top 116 domains, which comprise an estimated one-third of all web traffic. Our analysis scrutinizes various attributes of these domains, including their content sources and types, access requirements, offline presence, and ownership features. Our analysis reveals a significant concentration of web traffic, with a diminutive number of top websites capturing the majority of visits. Search engines, news and media, social networks, streaming, and adult content emerge as primary attractors of web traffic, which is also highly concentrated on platforms and USA-owned websites. Much of the traffic goes to for-profit but mostly free-of-charge websites, highlighting the dominance of business models not based on paywalls.

Read more

4/29/2024

Browsing behavior exposes identities on the Web
Total Score

0

Browsing behavior exposes identities on the Web

Marcos Oliveira, Junran Yang, Daniel Griffiths, Denis Bonnay, Juhi Kulshrestha

How easy is it to uniquely identify a person based solely on their web browsing behavior? Here we show that when people navigate the Web, their online traces produce fingerprints that identify them. Merely the four most visited web domains are enough to identify 95% of the individuals. These digital fingerprints are stable and render high re-identifiability. We demonstrate that we can re-identify 80% of the individuals in separate time slices of data. Such a privacy threat persists even with limited information about individuals' browsing behavior, reinforcing existing concerns around online privacy.

Read more

6/17/2024

On the Centralization and Regionalization of the Web
Total Score

0

On the Centralization and Regionalization of the Web

Gautam Akiwate, Kimberly Ruth, Rumaisa Habib, Zakir Durumeric

Over the past decade, Internet centralization and its implications for both people and the resilience of the Internet has become a topic of active debate. While the networking community informally agrees on the definition of centralization, we lack a formal metric for quantifying centralization, which limits research beyond descriptive analysis. In this work, we introduce a statistical measure for Internet centralization, which we use to better understand how the web is centralized across four layers of web infrastructure (hosting providers, DNS infrastructure, TLDs, and certificate authorities) in 150~countries. Our work uncovers significant geographical variation, as well as a complex interplay between centralization and sociopolitically driven regionalization. We hope that our work can serve as the foundation for more nuanced analysis to inform this important debate.

Read more

7/1/2024

🛸

Total Score

0

Through the Lens of Google CrUX: Dissecting Web Browsing Experience Across Devices and Countries

Jayasree Sengupta, Tanya Shreedhar, Dinh Nguyen, Robert Kramer, Vaibhav Bajpai

User quality of experience in the context of Web browsing is being researched widely, with plenty of developments occurring alongside technological advances, not seldom driven by big industry players. With the huge reach and infrastructure of Google, the Chrome User Experience Report (CrUX) provides quantitative real-life measurement data of a vast magnitude. Analysis of this steadily expanding dataset aggregating different user experience metrics, yields tangible insights into actual trends and developments. Hence, this paper is the first to study the CrUX dataset from the viewpoint of relevant metrics by quantitative evaluation of users Web browsing experience across three device types and nine European countries. Analysis of data segmented by connection type in the device dimension shows desktops outperforming other device types for all metrics. Similar analysis in the country dimension, shows North European countries (Sweden, Finland) having maximum 4G connections (85.99%, 81.41% respectively) and steadily performing 25%-36% better at the 75th percentile across all metrics compared to the worst performing country. Such a high-level longitudinal analysis of real-life Web browsing experience provides an extensive base for future research.

Read more

4/19/2024