Archive Team

{{Short description|Group dedicated to web archiving and digital preservation}} {{Infobox organization | name = Archive Team | formation = 2009 | logo = Archive Team logo.png | logo_caption = Archive Team’s Logo | founder = Jason Scott | website = [https://wiki.archiveteam.org/ archiveteam.org] }} '''Archive Team''' is a group dedicated to digital preservation and web archiving that was co-founded by Jason Scott in 2009.<ref name=Scott /><ref name=ArchiveTeamWiki />

Its primary focus is the copying and preservation of content housed by at-risk online services. Some of its projects include the partial and complete preservation of services such as GeoCities,<ref name=Gilbertson /><ref name=Modine /> Yahoo! Video, Google Video, Friendster, FortuneCity,{{efn|name=ArchiveTeamSuccessStories}} TwitPic,<ref name=TwitPic /> SoundCloud,<ref name=Deahl /> and the "Aaron Swartz Memorial JSTOR Liberator".<ref name=Farivar /> Archive Team also archives URL shortener services<ref name=URLTEAM /> and wikis<ref name=WikiTeamGitHub /> on a regular basis. The content archived by Archive Team is usually made available in the Wayback Machine, which is the recommended way of accessing it.<ref>{{Cite web |title=Frequently Asked Questions - Archiveteam |url=https://wiki.archiveteam.org/index.php/Frequently_Asked_Questions |access-date=2025-05-26 |website=wiki.archiveteam.org}}</ref>

According to Jason Scott, "Archive Team was started out of anger and a feeling of powerlessness, this feeling that we were letting companies decide for us what was going to survive and what was going to die."<ref name=OSBKeynote /> Scott continues, "it's not our job to figure out what's valuable, to figure out what's meaningful. We work by three virtues: rage, paranoia, and kleptomania."<ref name=OSBKeynote2 />

== Warrior/Tracker system == thumb|Scraping Telegram Archive Team is composed of a loose community of independent contributors/users.<ref>{{cite web |last1=Wodinsky |first1=Shoshana |last2=Mehrotra |first2=Dhruv |date=9 April 2021 |title=We're Archiving Yahoo Answers So You'll Always Know How Babby Is Formed |url=https://gizmodo.com/were-archiving-yahoo-answers-so-youll-always-know-how-b-1846643969 |website=Gizmodo |url-status=live |archive-url=https://web.archive.org/web/20250124105201/https://gizmodo.com/were-archiving-yahoo-answers-so-youll-always-know-how-b-1846643969 |archive-date=24 January 2025 |access-date=13 April 2025}}</ref><ref>{{cite web |last=Hill |first=Mark |date=12 May 2021 |title=Meet the Activist Archivists Saving the Internet From the Digital Dustbin |url=https://www.discovermagazine.com/technology/meet-the-activist-archivists-saving-the-internet-from-the-digital-dustbin |website=Discover Magazine |url-status=live |archive-url=https://web.archive.org/web/20241213061108/https://www.discovermagazine.com/technology/meet-the-activist-archivists-saving-the-internet-from-the-digital-dustbin |archive-date=13 December 2024 |access-date=13 April 2025}}</ref><ref>{{cite web |last=Mühlenmeier |first=Lennart |date=26 July 2023 |title=Shutdowns don't stop during the weekends |url=https://netzpolitik.org/2023/archive-team-shutdowns-dont-stop-during-the-weekends/ |website=netzpolitik.org |url-status=live |archive-url=https://web.archive.org/web/20250329180122/https://netzpolitik.org/2023/archive-team-shutdowns-dont-stop-during-the-weekends/ |archive-date=29 March 2025 |access-date=13 April 2025}}</ref> Their archival process makes use of a "Warrior", a virtual machine environment. Individuals use the Warrior in their desktop environments to download content without requiring technical expertise. Tasks are allocated by a centrally-managed Tracker that networks with and allocates items to Warriors. The tracker also monitors user upload activity and displays a leader board.<ref name=Ogden2021 />

=== Warrior Projects === There are several long-running Warrior projects:

* Imgur: The image host Imgur updated their terms of service on April 19, 2023. This update focused on removing old, unused, and inactive content that is not tied to a user account, along with NSFW content.<ref name=Imgur /> * Blogger: In May 2023, Google announced that inactive accounts would be deleted starting on 2023-12-01 across their platform, including Blogger blogs.<ref name=BloggerAT /> * Reddit: Banning communities that generate bad PR for Reddit Inc. Restricting access to APIs and data on June 19, 2023.<ref name=KeyserSosa /> * Russian invasion of Ukraine: Archiving various .ua sites in the wake of the Russian government's invasion.<ref name=dotua /> * Telegram: Archiving public messages in various newsworthy and/or otherwise notable Telegram channels.<ref name=TelegramAT /> * GitHub: When it was bought by Microsoft in 2018, many archivists and users were worried the site would become more restrictive. This project archives the UI parts of GitHub and the code of each repository.<ref name=GitHubAT /> * Mediafire: On 2020-12-18, users reported that they began receiving emails from MediaFire how they plan to classify accounts as abandoned if they fail to meet certain criteria, starting in January.<ref name=MediaFireAT /> * Coronavirus Outbreak: Documenting and preserving data, events, and impacts of COVID-19 on society.<ref name=CoronavirusAT /> * YouTube: Saving metadata, thumbnails, comments and selected videos. Videos and channels are to be limited to: Channels that may be deleted because company went bankrupt, channel owner died, YouTube banning certain content, and channels related to world events and politics.<ref name=YouTubeAT /> * Wikiteam: Saving wiki xml dumps.<ref name=WikiTeamWiki /> * Urlteam: Saving URL shorteners.<ref name=URLTeamAT /> * URLs: Archiving URLs from various sources.<ref name=URLs />

{{As of|2024|December|12}}, the largest project on ArchiveTeam is URLs, with over 10 petabytes archived.<ref name=Tracker />{{efn|The tracker uses units with binary prefixes e.g. pebibytes (1024 TiB, ~1126TB) instead of petabytes (1000 TB)}}

== ArchiveBot == ArchiveBot is a web archiving system operated by the Archive Team for conducting curated crawls of websites. Controlled through an IRC channel, ArchiveBot allows volunteers to submit URLs for archiving, typically in response to site shutdowns, policy changes, or other events threatening online data.

Jobs are processed by a network of worker systems known as pipelines, which crawl and save content in the WARC (Web ARChive) format. Volunteers monitor active crawls (jobs) via a public dashboard and may apply ignore rules to handle problematic areas of websites—such as calendars, infinite scroll, or session-based content that can disrupt recursive crawling.<ref>{{Cite web |title=ArchiveBot - Archiveteam |url=https://wiki.archiveteam.org/index.php/ArchiveBot |access-date=2025-05-27 |website=wiki.archiveteam.org}}</ref>

The results of ArchiveBot crawls are uploaded to the Internet Archive and are typically accessible through the Wayback Machine, where they can be viewed by the public.<ref>{{Cite web |title=ArchiveBot: The Archive Team Crowdsourced Crawler |url=https://archive.org/details/archivebot |access-date=2025-05-27 |website=archive.org}}</ref> ArchiveBot has been used to preserve a wide range of content, including user-generated platforms, news outlets, and government websites.<ref>{{Cite web |title=Domains - ArchiveBot Viewer |url=https://archive.fart.website/archivebot/viewer/domains |access-date=2025-05-27 |website=archive.fart.website}}</ref>

== See also == {{div col|colwidth=30em}} * Anna's Archive * Digital dark age * Digital hoarding * Flashpoint Archive * List of digital preservation initiatives {{div col end}}

== Notes == {{notelist|refs=

{{efn|name=ArchiveTeamSuccessStories|Sources covering Archive Team projects:<ref name=Sullivan /><ref name=Schwartz /><ref name=Garfield /><ref name=Masnick /><ref name=Scott2 /><ref name=Morton /><ref name=Misener /><ref name=Choudhury />}}

}}

== References == {{Reflist|30em|refs=

<ref name=ArchiveTeamWiki>{{Cite web|title=Revision history of "Main Page"|publisher=Archive Team|url=http://www.archiveteam.org/index.php?title=Main_Page&dir=prev&action=history |access-date=December 30, 2016|archive-url=https://web.archive.org/web/20161231075159/http://www.archiveteam.org/index.php?title=Main_Page&dir=prev&action=history|archive-date=2016-12-31|url-status=live}}</ref>

<ref name=BloggerAT>{{Cite web |title=Blogger - Archiveteam |url=https://wiki.archiveteam.org/index.php/Blogger |access-date=2024-01-02 |website=wiki.archiveteam.org}}</ref>

<ref name=Choudhury>{{Cite web|title=Amateur heroes of online heritage|last=Paul-Choudhury|first=Sumit|magazine=New Scientist|url=https://www.newscientist.com/article/dn20396-digital-legacy-amateur-heroes-of-online-heritage.html|url-status=live|archive-url=https://web.archive.org/web/20150402121921/http://www.newscientist.com/article/dn20396-digital-legacy-amateur-heroes-of-online-heritage.html|date=May 6, 2011|access-date=March 9, 2015|archive-date=April 2, 2015}}</ref>

<ref name=CoronavirusAT>{{Cite web |title=Coronavirus - Archiveteam |url=https://wiki.archiveteam.org/index.php/Coronavirus |access-date=2023-06-09 |website=wiki.archiveteam.org |archive-date=2023-06-09 |archive-url=https://web.archive.org/web/20230609170016/https://wiki.archiveteam.org/index.php/Coronavirus |url-status=live }}</ref>

<ref name=Deahl>{{cite web|url=https://www.theverge.com/2017/7/17/15986952/archive-team-back-up-soundcloud-warrior-project|title=Archive Team promises to back up SoundCloud amid worries of a shutdown |first=Dani |last=Deahl |date=2017-07-18|access-date=2018-11-28|archive-url=https://web.archive.org/web/20181021192516/https://www.theverge.com/2017/7/17/15986952/archive-team-back-up-soundcloud-warrior-project|archive-date=2018-10-21|url-status=live}}</ref>

<ref name=dotua>{{Cite web |title=.ua - Archiveteam |url=https://wiki.archiveteam.org/index.php/.ua |access-date=2023-06-09 |website=wiki.archiveteam.org |archive-date=2023-03-23 |archive-url=https://web.archive.org/web/20230323013739/https://wiki.archiveteam.org/index.php/.ua |url-status=live }}</ref>

<ref name=Farivar>{{Cite web|title=Aaron Swartz Memorial JSTOR Liberator sets public domain academic articles free |first=Cyrus |last=Farivar |url=https://arstechnica.com/tech-policy/2013/01/aaron-swartz-memorial-jstor-liberator-sets-public-domain-academic-articles-free/|date=2013-01-15|access-date=2018-11-28|archive-url=https://web.archive.org/web/20180323224110/https://arstechnica.com/tech-policy/2013/01/aaron-swartz-memorial-jstor-liberator-sets-public-domain-academic-articles-free/|archive-date=2018-03-23|url-status=live}}</ref>

<ref name=Garfield>{{Cite web|title=The Archive Team |last1=Garfield |first1=Bob |last2=Scott |first2=Jason |work=OnTheMedia |url=http://www.onthemedia.org/2012/mar/23/archive-team/|url-status=dead|archive-url=https://web.archive.org/web/20120427174039/http://www.onthemedia.org/2012/mar/23/archive-team/|date=2012-03-23|archive-date=2012-04-27|access-date=2012-04-19}}</ref>

<ref name=Gilbertson>{{Cite magazine|title=Geocities Lives On as Massive Torrent Download |last=Gilbertson |first=Scott |magazine=Wired |url=https://www.wired.com/epicenter/2010/11/geocities-lives-on-as-massive-torrent-download/|url-status=live|archive-url=https://web.archive.org/web/20120425190747/http://www.wired.com/epicenter/2010/11/geocities-lives-on-as-massive-torrent-download |date=2010-11-01 |archive-date=2012-04-25}}</ref>

<ref name=GitHubAT>{{Cite web |title=GitHub - Archiveteam |url=https://wiki.archiveteam.org/index.php/GitHub |access-date=2023-06-09 |website=wiki.archiveteam.org |archive-date=2023-05-27 |archive-url=https://web.archive.org/web/20230527000727/https://wiki.archiveteam.org/index.php/GitHub |url-status=live }}</ref>

<ref name=Imgur>{{Cite web |title=Imgur Terms of Service Update |url=https://help.imgur.com/hc/en-us/articles/14415587638029-Imgur-Terms-of-Service-Update |url-status=live |archive-url=https://web.archive.org/web/20230531003216/https://help.imgur.com/hc/en-us/articles/14415587638029-Imgur-Terms-of-Service-Update-April-19-2023- |archive-date=31 May 2023 |access-date=9 June 2023 |website=Imgur Help}}</ref>

<ref name=KeyserSosa>{{Cite web |last=Slowe |first=Christopher |date=2023-04-18 |title=An Update Regarding Reddit's API |url=http://www.reddit.com/r/reddit/comments/12qwagm/an_update_regarding_reddits_api/ |access-date=2023-06-09 |website=reddit.com |archive-date=2024-06-18 |archive-url=https://web.archive.org/web/20240618153526/https://old.reddit.com/r/reddit/comments/12qwagm/an_update_regarding_reddits_api/?limit=500 |url-status=live }}</ref>

<ref name=Masnick>{{Cite web|title=Historic Archive Of Websites From The January 18th SOPA Blackout|last=Masnick|first=Mike|website=Techdirt|url=http://www.techdirt.com/articles/20120410/23092818446/historic-archive-websites-january-18th-sopa-blackout.shtml|url-status=live|archive-url=https://web.archive.org/web/20120415042003/http://www.techdirt.com/articles/20120410/23092818446/historic-archive-websites-january-18th-sopa-blackout.shtml|date=2012-04-12|archive-date=2012-04-15}}</ref>

<ref name=MediaFireAT>{{Cite web |title=MediaFire - Archiveteam |url=https://wiki.archiveteam.org/index.php/MediaFire |access-date=2024-01-02 |website=wiki.archiveteam.org}}</ref>

<ref name=Misener>{{Cite web|title=Full Interview: Jason Scott on online video and digital heritage |last=Misener |first=Dan |publisher=CBC |url=http://www.cbc.ca/spark/2011/04/full-interview-jason-scott-on-online-video-and-digital-heritage/|url-status=live|archive-url=https://web.archive.org/web/20121026111503/http://www.cbc.ca/spark/2011/04/full-interview-jason-scott-on-online-video-and-digital-heritage/|date=2011-04-29 |archive-date=2012-10-26}}</ref>

<ref name=Modine>{{Cite web|title=Web 0.2 archivists save Geocities from deletion|last=Modine|first=Austin|website=The Register |url=https://www.theregister.co.uk/2009/04/28/geocities_preservation/|url-status=live|archive-url=https://web.archive.org/web/20120503090346/http://www.theregister.co.uk/2009/04/28/geocities_preservation/|date=2009-04-28|archive-date=2012-05-03}}</ref>

<ref name=Morton>{{Cite web|title=The Archive Team |last1=Morton |first1=Simon |last2=Scott |first2=Jason|work=RadioNZ|url=http://www.radionz.co.nz/national/programmes/thiswayup/audio/2511595/the-archive-team |url-status=live|archive-url=https://web.archive.org/web/20120421154803/http://www.radionz.co.nz/national/programmes/thiswayup/audio/2511595/the-archive-team|date=2012-03-03|archive-date=2012-04-21}}</ref>

<ref name=Ogden2021>{{cite journal |last1=Ogden |first1=Jessica |title="Everything on the internet can be saved": Archive Team, Tumblr and the cultural significance of web archiving |journal=Internet Histories |date=October 21, 2021 |volume=6 |issue=1–2 |pages=113–132 |doi=10.1080/24701475.2021.1985835|s2cid=239510759 |doi-access=free |hdl=1983/daef55ca-1fb1-4d91-a820-244bf24fe2b7 |hdl-access=free }}</ref>

<ref name=OSBKeynote>{{Cite web|title=Open Source Bridge 2012 Keynote - Jason Scott| website=YouTube | date=28 June 2012 |url=https://www.youtube.com/watch?v=tJqZGRIwtxk#t=1242s|access-date=2018-11-28|archive-url=https://web.archive.org/web/20170914044651/https://www.youtube.com/watch?v=tJqZGRIwtxk#t=1242s|archive-date=2017-09-14|url-status=live}}</ref>

<ref name=OSBKeynote2>{{Cite web|title=Open Source Bridge 2012 Keynote - Jason Scott| website=YouTube | date=28 June 2012 |url=https://www.youtube.com/watch?v=tJqZGRIwtxk#t=703s |access-date=2018-11-28|archive-url=https://web.archive.org/web/20170914044651/https://www.youtube.com/watch?v=tJqZGRIwtxk#t=703s |archive-date=2017-09-14|url-status=live}}</ref>

<ref name=Schwartz>{{Cite web |title=Fire in the Library|last=Schwartz|first=Matt|magazine=Technology Review |url=http://www.technologyreview.com/web/39317/|url-status=live|archive-url=https://web.archive.org/web/20120124203328/http://www.technologyreview.com/web/39317/|date=January 2012|archive-date=2012-01-24}}</ref>

<ref name=Scott>{{Cite web|title=Team Archive is GO|last=Scott|first=Jason|publisher=ASCII by Jason Scott |url=http://ascii.textfiles.com/archives/1664|date=January 6, 2009|access-date=December 30, 2016|archive-url=https://web.archive.org/web/20161102224557/http://ascii.textfiles.com/archives/1664|archive-date=2016-11-02|url-status=live}}</ref>

<ref name=Scott2>{{Cite web|title=Click: The Archive Team - Jason Scott talks about his mission to salvage our digital heritage |last=Scott |first=Jason |publisher=BBC |url=http://www.bbc.co.uk/programmes/p00przc2|url-status=dead|archive-url=https://web.archive.org/web/20150403081737/http://www.bbc.co.uk/programmes/p00przc2|date=2012-03-06|archive-date=2015-04-03}}</ref>

<ref name=Sullivan>{{Cite web|title=The 'Archive Team' Rescues User Content From Doomed Sites|last=Sullivan|first=Mark|magazine=PC World|url=https://www.pcworld.com/article/253672/the_archive_team_rescues_user_content_from_doomed_sites.html|url-status=live|archive-url=https://web.archive.org/web/20120420162819/http://www.pcworld.com/article/253672/the_archive_team_rescues_user_content_from_doomed_sites.html|date=2012-04-13|archive-date=2012-04-20}}</ref>

<ref name=TelegramAT>{{Cite web |title=Telegram - Archiveteam |url=https://wiki.archiveteam.org/index.php/Telegram |access-date=2023-06-09 |website=wiki.archiveteam.org |archive-date=2023-05-29 |archive-url=https://web.archive.org/web/20230529001705/https://wiki.archiveteam.org/index.php/Telegram |url-status=live }}</ref>

<ref name=Tracker>{{Cite web |title=URLs tracker Dashboard |url=https://tracker.archiveteam.org/urls/ |access-date=2024-12-12 |website=tracker.archiveteam.org |archive-date=2024-12-09 |archive-url=https://web.archive.org/web/20241209163256/https://tracker.archiveteam.org/urls/ |url-status=live }}</ref>

<ref name=TwitPic>{{Cite web|title=TwitPic - Archiveteam|url=https://www.archiveteam.org/index.php?title=TwitPic|access-date=2014-09-17|archive-url=https://web.archive.org/web/20140909075441/http://archiveteam.org/index.php?title=TwitPic|archive-date=2014-09-09|url-status=live}}</ref>

<ref name=URLs>{{Cite web |title=URLs - Archiveteam |url=https://wiki.archiveteam.org/index.php/URLs |access-date=2024-01-02 |website=wiki.archiveteam.org}}</ref>

<ref name=URLTEAM>{{Cite web|title=url shortening was a fucking awful idea|work=URLTE.AM|url=http://urlte.am/| url-status=live|archive-url=https://web.archive.org/web/20110611023208/http://urlte.am/|archive-date=2011-06-11}}</ref>

<ref name=URLTeamAT>{{Cite web |title=URLTeam - Archiveteam |url=https://wiki.archiveteam.org/index.php/URLTeam |access-date=2024-01-02 |website=wiki.archiveteam.org |language=en}}</ref>

<ref name=WikiTeamWiki>{{Cite web |title=WikiTeam - Archiveteam |url=https://wiki.archiveteam.org/index.php/WikiTeam |access-date=2024-01-02 |website=wiki.archiveteam.org}}</ref>

<ref name=WikiTeamGitHub>[https://github.com/WikiTeam/wikiteam WikiTeam] {{Webarchive|url=https://web.archive.org/web/20160210004433/https://github.com/WikiTeam/wikiteam|date=2016-02-10 |quote=Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis.}}</ref>

<ref name=YouTubeAT>{{Cite web |title=YouTube - Archiveteam |url=https://wiki.archiveteam.org/index.php/YouTube |access-date=2024-01-02 |website=wiki.archiveteam.org |language=en}}</ref>

}}

== External links == * {{official website|http://www.archiveteam.org}} * [https://archive.org/details/archiveteam Archive Team collection] at Internet Archive * {{Twitter}} * {{YouTube|id=-2ZTmuX3cog|title=ARCHIVE TEAM: A Distributed Preservation of Service Attack}} by Jason Scott * [https://www.reddit.com/r/ArchiveTeam ArchiveTeam subreddit] at reddit.com

Category:Jason Scott Category:Organizations established in 2009 Category:2009 in Internet culture Category:Web archiving initiatives