This means you’ll need something like a 128 GB flash drive to accommodate the large fileĪlso, on a related note, there's an interesting philosophical question related to this: >"The current Wikipedia file dump in English is around 95 GB in size. >"After reading this article, you’ll be able to save all ~6 million pages of Wikipedia so you can access the sum of human knowledge regardless of internet connection!" Check out this repo for an amazing list of tools and resources I use the `-generateWACZ` parameter so I can use ReplayWeb to easily browse through the final output.įor bookmark and misc webpage archiving then ArchiveBox should be more than enough. It uses Chrome to load webpages and has some extra features like custom browser profiles, interactive login, and autoscroll/autoplay. Its super easy to run with Docker and I use it to scrape entire blogs and docs for offline use. It can be self-hosted as well for the full offline experience.īrowsertrix-crawler: A CLI tool to scrape websites and output to WACZ. The interface is just like browsing through your browser. ReplayWeb: An interface to browse archive types like WARC, WACZ, and HAR. I use it on sites with annoying dynamic content that sites like wayback and ArchiveBox wouldn't be able to copy. WebRecorder: A browser extension that creates WACZ archives directly in the browser capturing exactly what content you load. If you use the docker-compose you can enable a full-text search backend for an easy search setup. I have my bookmarks archived in it and have a bookmarklet to easily add new websites to it. It can save websites as plain html, screenshot, text, and some other formats. I'll list some of the things I ended up using.ĪrchiveBox: Pretty much a self-hosted wayback machine. Not related to the OP topic or zim but I was looking into archiving my bookmarks and other content like documentation sites and wikis. Patrick gives more technical detail on an earlier version of the app's homepage. It was particularly useful while travelling, since I could load up articles and just read them on the plane.Īs I recall, there were several clever things that the app did to reduce the size of the dump many stub/redirect articles were removed, the formatting was pared down to the bare minimum, and it was all compressed quite efficiently to fit in such a small space. It was simply magical that I could have access to all of Wikipedia anytime, anywhere offline - especially since the iPod Touch could only connect to the Internet via WiFi. You could download various wikis that had been pre-processed to fit in a very small space - as I recall, the entire English Wikipedia was a mere 2 GB in size. Circa 2009 or so, my absolute favorite app for the iPod Touch was Patrick Collison's Offline Wikipedia (yes, that Patrick Collison.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |