Brigham Young University, Provo, Utah, United States
The widespread electronic transmission of pornography allows for a variety of new data sources to objectively measure pornography use. Recent studies have begun to use these data to rank order US states by per capita online pornography use and to identify the determinants of pornography use at the state level. The aim of this paper is to compare two previous methodologies for evaluating pornography use by state, as well as to measure online pornography use using multiple data sources. We find that state-level rankings from Pornhub.com, Google Trends, and the New Family Structures Survey are significantly correlated with each other. In contrast, we find that rankings based on data from a single large paid subscription pornography website has no significant correlation with rankings based on the other three data sources. Since so much of online pornography is accessed for free, research based solely on paid subscription data may yield misleading conclusions.
Keywords: Pornography, internet use, data, representative
While most researchers would agree that pornography has become more pervasive in recent decades, the accurate measurement of the level of pornography use in the population remains an empirical challenge for social scientists. The array of technologies used to access pornography has changed over time, making it almost impossible to consistently measure the same metric of pornography use. High-speed internet, which has penetrated markets gradually over the last fifteen years, enables unprecedented affordability, anonymity, and ease of access in pornography consumption (Cooper, 1998), contributing to the apparent general rise in pornography use (Wright, 2011). Hertlein and Stevenson (2010) also note other features particular to broadband internet pornography in contributing to growth of the industry: closer approximation to the physical world, acceptability, ambiguity, and accommodation between one’s “real” and “ought” self.
Past approaches to pornography use measurement have relied heavily on survey data (see Buzzell, 2005). The electronic nature of online pornography, however, increasingly makes possible a number of alternative methods for obtaining reliable proxies of pornography use, including those gathered from subscription or online search data. The ability to use an objective measure based on subscription or search data is advantageous since survey-based data generally suffers from a social desirability bias: respondents may underreport activities that violate social norms (Fisher, 1993). In addition, subscription data does not depend on an individual’s opinion about what constitutes pornography; a natural limitation of subjective survey questions about pornography use.
Two recent studies have tapped into innovative sources of data about online pornography use. Edelman (2009) uses subscription data from a single top-ten provider of paid pornographic content to create a ranking of which states use the most online pornography and correlates these with several state-level measures of social or religious attitudes. MacInnis and Hodson (2014) use Google Trends search term data as a proxy for pornography use and examine the relationship between state-level pornography use and measures of religiosity and conservatism. They find that states with more right-leaning ideological attitudes have higher rates of pornography-related Google searches.
This paper assesses some of the claims made in past studies about the rank order of states and the relationship between state-level pornography use and various state-level social measures. We also give a framework that future researchers can use to assess the representativeness of future state-level or even county-level datasets about pornography use. Edelman (2009) was a pioneer in accessing the subscription data of a single provider of paid pornographic content and this use of individual consumer data from private companies will become a useful tool for gathering data on hard-to-measure behavior. Key for the future use of this type of rich data will be identifying the degree to which the data from a single firm can provide the same insights as a nationally representative sample.
In this paper, we expand on the data used in these two recent studies and combine it with two additional data sources. Since each of the four data sources we use in this paper yields a measure of the level of pornography use, we estimate the validity of each source by comparing it against the state-level rankings that we obtain for the other sources.
Our paper draws on four data sources that include information on state-level variation in pornography use. The first two data sources are nationally representative samples while the last two are based on paid subscriptions or page views connected to a specific provider of pornographic content. In each data source our measure of pornography use is based on circumstances in which individuals seek out pornographic content rather than accidentally viewing pornography.
Our first dataset is based on a nationally representative sample of 2,988 respondents in the New Family Structures Survey (NFSS). The data collection was conducted by Knowledge Networks (KN), a research firm with a record of generating high-quality data. Knowledge Networks recruited members of its panel randomly by telephone and mail surveys, households are provided with internet access if needed. This panel has advantages in that it is not limited to current Internet users or computer owners, and does not accept self-selected volunteers.
The NFSS includes a question about whether the respondent intentionally viewed pornography in the previous year. This type of question has the advantage of capturing pornography use across whatever source the individual is using to access. There are other nationally representative samples such as the General Social Survey that include pornography questions. We use the data from the NFSS because it can be easily accessed by other scholars and includes state identifiers in its publically available form. In contrast, state identifiers can only be obtained in the confidential version of the General Social Survey. For the analysis in this paper, we use the set of forty-six states from the NFSS survey for which there were at least 50 respondents.
The second data source, Google Trends, functions as a time series index of the volume of searches entered into Google in a specific geographic area. These data have proven useful in economic and medical endeavors such as predicting influenza outbreaks (Carneiro & Mylonakis, 2009) and forecasting short term economic indicators such as consumer confidence or unemployment (Choi & Varian, 2012). Preis, Moat, and Stanley (2013) quantify trading behavior using Google Trends, showing that certain terms are linked with stock value increasing or decreasing. The adult entertainment industry can likewise be examined by using Google Trends search data to the extent that important features of its industry can be measured quantitatively.
The most important challenge in using Google Trends data is selecting the specific terms on which we draw data. The terms selected must be an actual indicator of pornography use for our analysis to be useful. Ho and Watters (2004) analyzed structural trends in pornographic websites. As part of their analysis they create a list of terms which appear frequently on pornographic websites and which frequently fail to appear on non-pornographic websites. The top four terms were “porn”, “xxx”, “sex”, and “f***”. Using search statistics we find that searches for these four terms are highly correlated. In contrast, searches of the term “pornography” are uncorrelated with any of these four terms and is a term that is likely to be used by people seeking information about pornography rather than accessing actual pornographic content.
There is also a distinction between “hard” and “soft” pornography, with “soft” generally referring to media that is sexual in nature, but does not depict penetration. The four terms previously listed will draw data only on users seeking hard content, but we still consider this to be an effective analysis for two reasons. Soft porn is not considered to be pornography by many viewers, and as a result it is pervasive even in mainstream media, including television and movies. Second, we find that the relative searches for soft pornography terms are minimal in comparison to searches for hard pornography terms. We did a relative search value for the search terms “porn” and “nude girls” over 2005-2013. Searches for both terms were normalized such that the maximum search volume took on the value 100, occurring for the term “porn”. In comparison to the normalized maximum, “nude girls” never has a search volume index greater than 6.
The data from Google Trends do not indicate the actual number of searches for a specific term in a geographic area. Each data point is normalized by dividing the number of searches for the term by the total number of all searches in that area. The data is therefore controlled for both population and the differences in search volume among states. Google Trends also eliminates repeated searches by a single individual in a short period of time to prevent a single individual from skewing the results.
Data are available at the state-week level from Google Trends. We use data over the year July 2013-July 2014. Our observations are adjusted to a 1-100 scale. A state with the highest normalized searches of a specific term during a one week period in our dataset has a reading of 100. Using this data on each term we construct an index of pornography searches for each state-week of our data with a weighted sum using the four terms. We weight “porn” and “sex” more heavily because their relative searches are much greater than compared to “f***”, and “xxx”. Specifically, we use the mean relative weighting of each term over the past year. We then use this weighted search volume ranking of states by Google Trends to geographically model the adult entertainment industry.
One of the advantages of using data from Google Trends as opposed to website-specific subscription data is that it includes the information about individuals searching out both free and paid adult entertainment. Doran (2008) notes that about 80-90% of visitors to pornographic websites only access free pornographic material, suggesting that analysis of paid adult entertainment may obscure actual patterns of pornography consumption in general.
Our third data source records the number of subscriptions to one of the top-ten largest providers of paid pornographic content used in a recent study by Edelman (2009). Edelman’s analysis of this dataset was a novel contribution to the literature; previous studies of pornography use had only examined survey data. The specific data used was the zip code associated with all credit card subscriptions between 2006 and 2008. This particular content provider has hundreds of sites covering a broad range of adult entertainment. Edelman (2009) acknowledges, however, that “it is difficult to confirm rigorously that this seller is representative.”
Although the source of this subscription data is a top-10 seller of adult entertainment, the subscriptions are very low relative to the patterns of pornography use we observe in survey data like the NFSS, where 47% of adults report using pornography in the last year. The state with the most subscriptions per broadband household is Utah with 5.47 for every 1,000 households with broadband. The lowest state is Montana with 1.92 subscriptions for every 1,000 households with broadband. These low rates suggest that the market share for individual content providers of pornography is small, making it difficult to know whether the data from one provider can provide an accurate cross-state comparison. As mentioned before, the vast majority of individuals who access pornography online only access free content rather than using a paid site such as those studied by Edelman (Doran, 2010).
Our fourth data source is page view data from Pornhub.com, which was the third largest online host of adult entertainment in the United States at the time. We use the Pornhub data due its size as well as the availability of data. Pornhub made the page views per capita during the year 2013 publicly available and reported this data separately by state. The Pornhub data is similar in nature to Edelman’s data in that it is a provider-side objective measure of pornography use. However, the data records page views instead of subscribers; intuitively, the data would reveal patterns of heavy per-person use as well as patterns of proliferation among the population. The data also has the relative advantage of including both paid and unpaid use.
Assessing the representativeness of new data sources
The big data revolution is beginning to dramatically open up the types of data sources that can be used to measure and study behaviors, such as pornography use. The subscription data used by Edelman (2009) represents the type of large datasets that will increasingly become available to scholars in their research. An important first step in using this type of proprietary data will be assessing the degree to which the data from a single provider is representative of the general population of interest. In this section, we provide a framework assessing the representativeness of a dataset by comparing it to the patterns observed from another data that is known to be nationally representative or by comparing it to a combination of other data sources that collectively are likely to represent the true underlying pattern of behavior.
In Table 1 we list the top ten and bottom ten states for pornography use based on each of the four sources: subscription data, Pornhub, NFSS, and Google Trends. Mississippi is one state that ranks in the top four states in pornography use across all four datasets and Idaho consistently ranks near the lowest rates of any states across most of the measures. In contrast, other states such as Arkansas and Utah rank in the top ten along some measures but in the bottom ten along other measures. These results suggest that identifying which state seems to have the highest rates of pornography use based on a single data source can be a bit problematic.
Table 1. Rank Order of States Based on Four Different Data Sources Controlled
for Broadband Internet Access.
In Table 2 panel A we estimate the correlation between each of the data sources using the actual measures of pornography use from each source rather than the ordinal ranking which is reported in Table 1 from these measures. The paid subscription data has, by far, the weakest correlation with the other three sources and is even negatively correlated with the NFSS survey data. The paid subscription data has a correlation of -0.0358 with the NFSS, 0.076 with Google Trends, and 0.0066 with Pornhub. None of these correlations are statistically significant; corresponding t-statistics are all less than 0.6 (which correspond to directional p-values greater than .3). In contrast, the other three rankings show relatively notable correlations. Google Trends and Pornhub have a correlation of .487, NFSS and Google Trends have a correlation of .655 and Pornhub and NFSS have a correlation of .551. All of these correlations are statistically significant with a t-statistic between Google Trends and Pornhub of 3.78, between NFSS and Google Trends of 5.68, and between Pornhub and NFSS of 4.28. All of these correspond to directional p-values of less than .0004.
In panel B we report correlations using the ordinal rankings created from each data source. Correlations between NFSS, Google trends, and Pornhub have comparable correlation coefficients and significance to those in panel A, likewise the correlation between Google trends and paid subscription is similar. The panel is notable because when using ordinal rankings paid subscription data better correlate with Pornhub and NFSS survey data, however the correlations are still insignificant. The two panels allow us to draw similar conclusions, however the larger coefficients for paid subscription data are worth noting despite the fact that they are insignificant and notably weaker than the correlations of the other sources with each other. We believe the correlations using the actual measures of pornography use rather than ordinal rankings best represents the industry because it accounts for the actual difference in pornography use rather than just the specific ordering of the states.
Table 2. Correlation between the Four Data Sources.
The significant correlation between the three non-paid subscription data sources, despite the different variables they measure (search volume, page views and proportion of pornography viewers), suggest that they are measuring a real underlying pattern of variation in pornography use across states; one that is not correlated with the subscription data used by Edelman (2009).
Sensitivity of estimates to data source used
In order to illustrate the importance of accounting for the differences in state pornography rates across different data sources, we replicate the results of a recent study that found that more religious and more conservative states were more likely to search for sexual content on Google (MacInnis & Hodson, 2014). We examine whether the conclusions of that paper apply to other measures of pornography use using the other data sources that we have described in this paper. The results of this replication are given in Table 3. We standardized the pornography-use, religiosity, and conservatism measures by subtracting the mean and dividing by the standard deviation to allow for comparisons across the different pornography use measures (this approach is equivalent to converting each of the measures into a Z-score).
Table 3. Correlations between State-Level Religiosity or Conservatism and Each Metric
of Pornography Use.
In the original study, MacInnis and Hodson (2014) gave results based on Google Trends data separately for specific search terms such as sex, porn, and XXX, similar to the terms that we are using in our Google Trends measure. The results in the first row of Table 3 show that we also find a statistically significant relationship between religiosity and conservatism in most cases when we use the Google Trends data. However, the other rows in Table 3 show that we get a much weaker statistical relationship when using any of the other three data sources. These results suggest that if MacInnis and Hodson (2014) had used any of the other three data sources, they probably would have come to a different conclusion in their paper about the strength of the relationship they were examining.
The fact that MacInnis and Hodson (2014) find a statistically significant relationship between state-level religiosity and state-level pornography use is interesting considering that past studies using individual level data find that individuals who regularly attend church are much less likely to use pornography (Doran & Price, 2014; Patterson & Price, 2012; Stack, Wasserman, & Kearns, 2004). This type of pattern in which group-level relationships are opposite what is found at the individual level has also been found in the relationship between education and religion (Glaeser & Sacerdote, 2008) and the relationship between income and political affiliation (Glaeser & Sacerdote, 2007).
Each of the data sources considered above captures a different cross-sectional view of the online pornography industry, and each has important vulnerabilities for researchers interested in general levels of pornography use by state. NFSS survey data, for example, probably underreports pornography consumption because of social desirability bias and subjects’ faulty memory. Google Trends data fails to capture any pornography use that is accessed through means other than a Google search. Pornhub and paid subscription data may be limited in their representativeness; they measure use with respect to only a single firm in the industry.
When data from any source is used in research, results must be presented in context of the data that lead to those results. Issues arise when individuals mistakenly interpret a given data source as representing the entirety of the pornography industry. There are many other settings in which similarly non-representative data may be erroneously over-generalized. Researchers and individuals must be aware of the external validity of their findings while the media and readers must be careful not to overgeneralize results.
We also recognize a limitation of our data sources in that they capture the pornography industry in different historical moments; Google Trends (2013-2014), paid subscription (2006-2008), Pornhub (2013), and NFSS (2012). Paid subscription data were collected approximately 6-7 years prior to the other sources. This time difference may bias our results, however the general trends in the data sources as a whole are such that we believe our findings to be accurate. Major shifts in the relative use of pornography across states from 2006-2013 would be needed for this bias to occur which we believe is unlikely.
When attempting to rank order individuals regarding some form of activity, multiple sources (if available) must be viewed for the sake of contrasting results. Should the orderings be similar their accuracy can be more readily assumed. Should they differ, an opportunity arises to understand more regarding the issue. In our particular case, the differences are likely to arise because the sources capture different types of pornography use.
Past research on pornography use has touched on the degree to which it might affect important areas of interest such as divorce, happiness, worker productivity and sexual violence (Bergen & Bogle, 2000; Doran & Price, 2014; Patterson & Price, 2012; Young & Case, 2004). When such research is being conducted data must be from a reliable and generalizable source (or sources). Results and findings of any such effects must be considered in light of the age, gender, and sexual identity of individuals as well – factors which are not considered in this paper (Sevcikova & Daneback, 2014; Stoops, 2015; Traeen & Daneback, 2013; Tripodi et al. 2015). In such research opportunities pornography use by state may play a role in the analysis. Given the results of this paper the data source of such a variable must be heavily considered in such a regression and result must be interpreted in context of the data source.
Data provided by specific companies have the potential to provide important insights into public issues. A major challenge is determining when the data of a single company, even a very large one, can provide insights that are representative of the entire population. Assuming relative rates of pornography across states did not have major changes from 2006-2013, the results of our paper suggest that in some cases the information from a single company may make for a misleading picture of the geographic patterns of a specific behavior. This can be particularly important for pornography use since the vast majorities of individuals who access pornography online only access free content rather than using a paid site (Doran, 2008).
The results of this paper draw on four different data sources about pornography use including two that involve nationally representative data (Google Trends and NFSS). We find a significant correlation between three of our data sources suggesting that they all reflect a similar underlying pattern in pornography use across states. In contrast paid subscription data, the one source that has received a fair amount of media attention, actually correlates rather poorly with the other sources. We also show that choices across data sources can affect the conclusions that studies draw and suggest that future studies include sensitivity tests across data sources when examining issues for which it is challenging to get an ideal measure of the specific behavior.
Bergen, R., & Bogle, K. (2000). Exploring the connection between pornography and sexual violence. Violence and Victims, 15
Buzzell, T. (2005). Demographic characteristics of persons using pornography in three technological contexts. Sexuality & Culture. 9, 28-48. http://dx.doi.org/10.1007/BF02908761
Carneiro, H. A., & Mylonakis, E. (2009). Google trends: A web‐based tool for real‐time surveillance of disease outbreaks. Clinical Infectious Diseases, 49, 1557-1564. http://dx.doi.org/10.1086/630200
Choi, H., & Varian, H. (2012). Predicting the present with Google trends. Economic Record, 88(s1), 2-9. http://dx.doi.org/10.1111/j.1475-4932.2012.00809.x
Cooper, A. (1998). Sexuality and the internet: Surfing into the new millennium. CyberPsychology & Behavior, 1, 187-193. http://dx.doi.org/10.1089/cpb.1998.1.187
Doran, K. (2010). Industry size, measurement, and social costs. In M. Eberstadt & M. A. Layden (Eds.), The social costs of pornography: A collection of papers. Princeton, NJ: The Witherspoon Institute.
Doran, K., & Price, J. (2014). Pornography and Marriage. Journal of Family and Economic Issues, 35, 489-498. http://dx.doi.org/10.1007/s10834-014-9391-6
Edelman, B. (2009). Markets: Red light states: Who buys online adult entertainment? Journal of Economic Perspectives, 23(1), 209-220. http://dx.doi.org/10.1257/jep.23.1.209
Fisher, R. (1993). Social desirability bias and validity of indirect questioning. Journal of Consumer Research, 20, 303-315. http://dx.doi.org/10.1086/209351
Glaeser, E., & Sacerdote, B. (2007). Aggregation reversals and the social formation of beliefs. NBER Working Paper No. 13031. Retrieved from http://www.nber.org/papers/w13031.pdf
Glaeser, E., & Sacerdote, B. (2008). Education and Religion. Journal of Human Capital, 2, 188-215. http://dx.doi.org/10.1086/590413
Hertlein, K., & Stevenson, A. (2010). The seven “As” contributing to internet-related intimacy problems: A literature review. Cyberpsychology: Journal of Psychosocial Research on Cyberspace, 4(1), article 1. Retrieved from http://www.cyberpsychology.eu/view.php?cisloclanku=2010050202
Ho, W., & Watters, P. (2004). Statistical and structural approaches to filtering internet Pornography. In Systems, Man and Cybernetics, 2004 IEEE International Conference on: vol. 5, (pp. 4792-4798).
MacInnis, C., & Hodson, G. (2014). Do American states with more religious or conservative populations search more for sexual content on Google? Archives of Sexual Behavior, 44, 137-147. http://dx.doi.org/10.1007/s10508-014-0361-8
Patterson, R., & Price, J. (2012). Pornography, religion, and the happiness gap: Does pornography impact the actively religious differently? Journal of the Scientific Study of Religion, 51, 79-89. http://dx.doi.org/10.1111/j.1468-5906.2011.01630.x
Preis, T., Moat, H., & Stanley, H. (2013). Quantifying trading behavior in financial markets using Google Trends. Scientific Reports, 3, 1684.
Sevcikova, A., & Daneback, K. (2014). Online pornography use in adolescence: Age and gender differences. European Journal of Developmental Psychology, 11, 674-686. http://dx.doi.org/10.1080/17405629.2014.926808
Stack, S., Wasserman, I., & Kern, R. (2004). Adult social bonds and the use of internet pornography. Social Science Quarterly, 85, 75–88. http://dx.doi.org/10.1111/j.0038-4941.2004.08501006.x
Stoops, J. (2015). Class and gender dynamics of the pornography trade in late nineteenth-century Britain. The Historical Journal, 58, 137-156. http://dx.doi.org/10.1017/S0018246X14000090
Traeen, B., & Daneback, K. (2013). The use of pornography and sexual behaviour among Norwegian men and women of differing sexual orientation. Sexologies, 22, e41-e48. http://dx.doi.org/10.1016/j.sexol.2012.03.001
Tripodi, F., Eleuteri, S., Giuliani, M., Rossi, R., Livi, S., Petruccelli, I., Petruccelli, F., Daneback, K., & Simonelli C. (2015). Unusual online sexual interests in heterosexual Swedish and Italian university students. Sexologies, Advanced online publication. http://dx.doi.org/10.1016/j.sexol.2015.03.003
Wright, P. (2011). U.S. males and pornography, 1973–2010: Consumption, predictors, correlates. Journal of Sex Research, 50, 60-71. http://dx.doi.org/10.1080/00224499.2011.628132
Young, K., & Case, C. (2004). Internet Abuse in the Workplace: New Trends in Risk Management. CyberPsychology and Behavior, 7, 105-111. http://dx.doi.org/10.1089/109493104322820174
130 Faculty Office Building
United States of America