# Data collection

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Data_collection
> Markdown URL: https://mediated.wiki/source/Data_collection.md
> Source: https://en.wikipedia.org/wiki/Data_collection
> Source revision: 1316677537
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

{{Short description|Gathering information for analysis}}
{{Multiple issues|
{{More citations needed|date=April 2017}}
{{Prose|date=August 2021}}
}}

[[File:Automated weighbridge for Adélie penguins - journal.pone.0085291.g002.png|thumb|Example of data collection in the biological sciences: [Adélie penguin](/source/Ad%C3%A9lie_penguin)s are identified and weighed each time they cross the automated [weighbridge](/source/weighbridge) on their way to or from the sea.<ref name="Lescroël2014">{{Cite journal | last1 = Lescroël | first1 = A. L. | last2 = Ballard | first2 = G. | last3 = Grémillet | first3 = D. | last4 = Authier | first4 = M. | last5 = Ainley | first5 = D. G. | editor1-last = Descamps | editor1-first = Sébastien | title = Antarctic Climate Change: Extreme Events Disrupt Plastic Phenotypic Response in Adélie Penguins | doi = 10.1371/journal.pone.0085291 | journal = PLOS ONE | volume = 9 | issue = 1 | article-number = e85291 | year = 2014 | pmid =   24489657| pmc =  3906005| bibcode = 2014PLoSO...985291L | doi-access = free }}</ref> ]]

'''Data collection''' or '''data gathering''' is the process of gathering and [measuring](/source/measuring) [information](/source/information) on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. [Data](/source/Data) collection is a [research](/source/research) component in all study fields, including [physical](/source/physical_science) and [social science](/source/social_science)s, [humanities](/source/humanities),<ref name=VuongetalSdata2018>{{cite journal|title=An open database of productivity in Vietnam's social sciences and humanities for public use|journal=Scientific Data|volume=5|page=180188|date=September 25, 2018|doi=10.1038/sdata.2018.188 |pmid=30251992|pmc=6154282|last1=Vuong|first1=Quan-Hoang|last2=La|first2=Viet-Phuong|last3=Vuong|first3=Thu-Trang|last4=Ho|first4=Manh-Toan|last5=Nguyen|first5=Hong-Kong T.|last6=Nguyen|first6=Viet-Ha|last7=Pham|first7=Hiep-Hung|last8=Ho|first8=Manh-Tung|bibcode=2018NatSD...580188V}}</ref> and [business](/source/business). While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same. The goal for all data collection is to capture evidence that allows [data analysis](/source/data_analysis) to lead to the formulation of credible answers to the questions that have been posed.

Regardless of the field of or preference for defining data ([quantitative](/source/Quantitative_method) or [qualitative](/source/Qualitative_method)), accurate data collection is essential to maintain research integrity. The selection of appropriate data collection instruments (existing, modified, or newly developed) and delineated instructions for their correct use reduce the likelihood of [errors](/source/measurement_error).

==Methodology==
{{missing information|experiment, sampling, measurement and preprocessing|date=July 2023}}
{{see also|scientific method}}
Data collection and [validation](/source/data_validation) consist of four steps when it involves taking a [census](/source/census) and seven steps when it involves [sampling](/source/sampling_(statistics)).<ref>Ziafati Bafarasat, A. (2021) Collecting and validating data: A simple guide for researchers. Advance. Preprint.. https://doi.org/10.31124/advance.13637864.v1</ref>

A formal data collection process is necessary, as it ensures that the data gathered are both defined and accurate. This way, subsequent decisions based on arguments embodied in the findings are made using valid data.<ref>Data Collection and Analysis By Dr. Roger Sapsford, Victor Jupp {{ISBN|0-7619-5046-X}}</ref> The process provides both a baseline from which to measure and in certain cases an indication of what to improve.

===Tools===
====Data collection system====
{{main|Data collection system}}

==== Data management platform ====
{{main|Data management platform}}
''[Data management platform](/source/Data_management_platform)s'' (DMP) are centralized storage and analytical systems for data, mainly used in [marketing](/source/marketing). DMPs exist to compile and transform large amounts of [demand and supply](/source/demand_and_supply) data into discernible information. Marketers may want to receive and utilize first, second and third-party data. DMPs enable this, because they are the aggregate system of [DSPs](/source/Demand-side_platform) (demand side platform) and [SSPs](/source/Supply-side_platform) (supply side platform). DMPs are integral for optimizing and future advertising campaigns.

=== Data integrity issues ===
The main reason for maintaining [data integrity](/source/data_integrity) is to support the observation of errors in the data collection process. Those errors may be made intentionally (deliberate [falsification](/source/False_evidence)) or non-intentionally ([random](/source/random_error) or [systematic errors](/source/systematic_errors)).<ref>{{cite web|last=Northern Illinois University|date=2005|title=Data Collection|url=https://ori.hhs.gov/education/products/n_illinois_u/datamanagement/dctopic.html|access-date=June 8, 2019|website=Responsible Conduct in Data Management}}</ref>

There are two approaches that may protect data integrity and secure scientific validity of study results:<ref>{{cite journal |last1=Most |first1=Marlene M. |last2=Craddick |first2=Shirley |last3=Crawford |first3=Staci |last4=Redican |first4=Susan |last5=Rhodes |first5=Donna |last6=Rukenbrod |first6=Fran |last7=Laws |first7=Reesa |title=Dietary quality assurance processes of the DASH-Sodium controlled diet study |journal=Journal of the American Dietetic Association |date=October 2003 |volume=103 |issue=10 |pages=1339–1346|pmid= 14520254 |doi=10.1016/s0002-8223(03)01080-0}}</ref>
* Quality assurance – all actions carried out before data collection
* Quality control – all actions carried out during and after data collection

==== Quality assurance (QA)====
{{further|Quality assurance}}
QA's focus is prevention, which is primarily a cost-effective activity to protect the integrity of data collection. Standardization of protocol, with comprehensive and detailed procedure descriptions for data collection, are central for prevention. The risk of failing to identify problems and errors in the research process is often caused by poorly written guidelines. Listed are several examples of such failures:

* Uncertainty of timing, methods and identification of the responsible person
* Partial listing of items needed to be collected
* Vague description of data collection instruments instead of rigorous step-by-step instructions on administering tests
* Failure to recognize exact content and strategies for training and retraining staff members responsible for data collection
* Unclear instructions for using, making adjustments to, and calibrating data collection equipment
* No predetermined mechanism to document changes in procedures that occur during the investigation

====User privacy issues==== 
There are serious concerns about the integrity of individual user data collected by [cloud computing](/source/cloud_computing), because this data is transferred across countries that have different standards of protection for individual user data.<ref>{{cite book |last1=Wang |first1=Faye Fangfei |title=Law of Electronic Commercial Transactions: Contemporary Issues in the EU, US and China |date=10 January 2014 |publisher=Routledge |isbn=978-1-134-11522-8 |page=154
|url=https://books.google.com/books?id=Uz5PEAAAQBAJ&pg=PA154 |language=en}}</ref> Information processing has advanced to the level where user data can now be used to predict what an individual is saying before they even speak.<ref>{{cite web |title=Data, not privacy, is the real danger
|url=https://www.nbcnews.com/business/business-news/why-data-not-privacy-real-danger-n966621
|website=NBC News |date=4 February 2019
|language=en}}</ref>

==== Quality control (QC)====
{{further|Quality control}}
Since QC actions occur during or after the data collection, all the details can be carefully documented. There is a necessity for a clearly defined communication structure as a precondition for establishing monitoring systems. Uncertainty about the flow of information is not recommended, as a poorly organized communication structure leads to lax monitoring and can also limit the opportunities for detecting errors. Quality control is also responsible for the identification of actions necessary for correcting faulty data collection practices and also minimizing such future occurrences. A [team](/source/team) is more likely to not realize the necessity to perform these actions if their procedures are written vaguely and are not based on feedback or education.

Data collection problems that necessitate prompt action:
* [Systematic error](/source/Systematic_error)s
* Violation of protocol 
* [Fraud](/source/Fraud) or scientific misconduct
* Errors in individual data items
* Individual staff or site performance problems
* [Shadow effect](/source/Shadow_Effect_(Genetics))

== See also ==
{{columns-list|colwidth=20em|
* [Controlled experiment](/source/Experiment)
* [Data acquisition](/source/Data_acquisition)
* [Data curation](/source/Data_curation)
* [Data management](/source/Data_management)
* [Observational study](/source/Observational_study)
* [Sampling (statistics)](/source/Sampling_(statistics))
* [Scientific data archiving](/source/Scientific_data_archiving)
* [Statistical survey](/source/Statistical_survey)
* [Survey data collection](/source/Survey_data_collection)
* [Qualitative method](/source/Qualitative_method)
* [Quantitative method](/source/Quantitative_method)
* [Quantitative methods in criminology](/source/Quantitative_methods_in_criminology)
* [Data mining](/source/Data_mining)
}}

== References ==
{{Reflist}}

== External links ==
{{Commons}}
*[https://www.techtarget.com/searchcio/definition/data-collection?amp=1 All about data collection] – TechTarget.com

{{data}}
{{Statistics|collection}}
{{Portal bar|Business and economics|Science|Mathematics|Contents}}
{{Authority control}}

Category:Data collection
Category:Survey methodology
Category:Design of experiments

---
Adapted from the Wikipedia article [Data collection](https://en.wikipedia.org/wiki/Data_collection) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Data_collection?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.
