# URL 'whitelist': urls of these forms go into the keep pile. # whitelist overrides blacklist and greylist. # FORMAT: # precede URL by ^ to whitelist urls that match the given prefix # succeed URL by $ to whitelist urls that match the given suffix # ^url$ will whitelist urls that match the given url completely # Without either ^ or $ symbol, urls containing the given url will get whitelisted # Special exception for this url on yale.edu, since we needed to blacklist # some particular other urls on yale.edu http://korora.econ.yale.edu/phillips/archive/hauraki.htm # We've added .ru$ sites to the blacklist, but the following # Russian website contains actual Maori language content http://www.krassotkin.ru/sites/prayer.su/maori/ https://mi.centr-zashity.ru/ # WHITELIST WEBSITES THAT HAVE NON-AUTOMATED /mi/ SUBSECTIONS # WE CONTROL WHAT PART OF THEM WILL BE DOWNLOADED (THE /mi SUBSECTION) # IN sites-too-big-to-exhaustively-crawl.txt #https://www.martinvrijland.nl/mi/te-mana-hinengaro/Ko-te-nuinga-ake-o-nga-tangata-kei-te-timata-ki-te-kite-kei-te-noho-tatou-i-roto-i-te-whakaata-ko-te-aha-tenei/ #https://www.csunplugged.org/mi/principles/ #http://www.gpedia.com/mi/gpedia/Reo_M%C4%81ori https://www.martinvrijland.nl https://www.csunplugged.org http://www.gpedia.com