PSA2016: The 25th Biennial Meeting of the Philosophy of Science Association

Full Program »

Big Data and The Evaluation of Expertise

The increase in the use of databases to share and use data in the sciences has generated both new sources of knowledge and new potential issues with generating that knowledge. Databases require substantial work to maintain, requiring personnel to manage the data as it comes in and adjust the structure and form of the database in response to new datasets. When researchers use the data in a database for new analyses and research, the researchers need enough information about the data to evaluate its reliability and quality. The purpose of creating a database is to provide a repository of data that can be consistently reused, but the majority of the datasets in these databases will never reach such a stage, precisely because there is not enough information to evaluate their reliability and quality.
Much of the work done on databases is completed by data curators, who may have a wide range of experience and knowledge about the datasets contained in the database. Social epistemology may have resources that can help with some of the issues in Big Data. In “Experts: Which One Should You Trust?”, Alvin Goldman provides a way for a novice who has limited knowledge to evaluate the claims of two purported experts. The curators of a database may be in a similar position to the novice in Goldman’s account, so considering the types of evidence he says a novice can use could perhaps inform how curators should proceed. Goldman suggests that a novice can use meta-scores by other experts, credentials, possible biases, and the track record of a given expert to decide whether or not to trust that expert. Similarly, a data curator may be able to use the same sources of evidence to evaluate whether a researcher or research group’s data should be included in a database. Many of these sources of evidence seem intuitively plausible.
However, the practicality of Goldman’s source of evidence may create further problems for a curator. When a curator decides whether a data set from a researcher or research group was worth including, they must be able to do so in a timely manner; extensive research on their part should not be required. Currently, to gather evidence for all the types of evidence Goldman suggests, time consuming research would be required. Even if that problem is surmounted, others remain. Meta-scores may be difficult to create, if the data sets come from different fields or subfields with different evaluations of methodology. Credentials are unlikely to be decisive, since the majority of researchers will have roughly the same credentials. Some biases, such as those stemming from funding sources that influence results, may be easier to detect than community-wide biases. If the track record of a researcher or research group was readily available, it would still be unclear what matters in determining the reliability of the data. In order for Goldman’s source of evidence to be useful to curators, many of these problems must be resolved.

Author Information:

Darcy McCusker    
Philosophy Department
University of Washington


Powered by OpenConf®
Copyright©2002-2015 Zakon Group LLC