From ocl at gih.com Sat Feb 6 16:32:34 2016 From: ocl at gih.com (Olivier MJ Crepin-Leblond) Date: Sat, 6 Feb 2016 17:32:34 +0100 Subject: [IPv6crawler-wg] An important update about the IPv6 Matrix Project In-Reply-To: <56575051.3070402@gih.com> References: <56575051.3070402@gih.com> Message-ID: <56B62022.7060009@gih.com> Hello all, another update: the first complete run using the new TLDs has completed! You can view the results up to February 2016 from http://www.ipv6matrix.org In adding new gTLDs we have hit a snag, although this snag does not significantly affect overall results since it appears to only affect a tiny number of domains. I am speaking about Internationalized Top Level Domains (IDNs): xn--3e0b707e xn--80adxhks xn--90ais xn--j1amh xn--pgbs0dh xn--wgbl6a xn--4gbrim xn--80asehdb xn--d1acj3b xn--p1ai xn--q9jyb4c Each of these is the ASCII equivalent of a non ASCII domain name. Whist the Crawler works well with them and we are able to collect all of the data pertaining to crawls in IDNs, the program that builds the Database uses SQLite. Until now, database entries made use of domain names that were ASCII - but IDNs use a double dash "--" in the domain. SQLite coughs on DASH - so we have not been able to produce the database needed for the displaying of the results when including IDNs. Until we have a workaround, I have manually isolated data collected for IDNs, which means we still collect them, but we will not take them into account in the final database results. As I have said, this is a tiny subset of domains: 760 entries out of a total of 1 Million domains. I am *still* drafting a very long article for RIPE labs. In fact, we might publish this in two parts. In the meantime, the results appear to be somehow consistent with results of other tracking projects, some of which use other methods to track IPv6 adoption: - http://6lab.cisco.com/stats/ - https://www.vyncke.org/ipv6status/ - http://www.mrp.net/ipv6_survey/ We now have 306 Gb of comma separated value text data in store, tracing back the spread of the IPv6 Internet since July 2010. (294Gb in November 2015) I look forward to your kind feedback. Warmest regards, Olivier On 26/11/2015 19:32, Olivier MJ Crepin-Leblond wrote: > Hello all, > > Two worthy pieces of news regarding the IPv6 Matrix Project ( > http://www.ipv6matrix.org ): > > 1. I have updated the Web site with the latest results ending in late > October - hence noting a Crawl display date of November 2015. > We now have 294 Gb of comma separated value text data in store, > tracing back the spread of the IPv6 Internet since July 2010. > Altogether, we ran the text approximately 36 times on all 1 million > Alexa busiest Domain names. This represented testing of about 6.5 > million hosts, carefully collecting traceroute information for each > and every of them. We now have a very unique database that is showing > the spread of the IPv6 Internet information sources worldwide. > > 2. Today I took out my very dusty Linux & Python gloves and performed > a much needed update to the IPv6 Matrix Crawler input database, > including the Alexa 1 million list as well as GeoIP Databases. > > Indeed, the Alexa database of the world's 1 million busiest Web sites > dated from the Crawler's first inception in the first half of 2010. > We're more than 5 years later! > > In a way, keeping the same input database has kept the base of crawls > the steady thus the ability to compare results was possible. However, > the flip-side of the coin is that we are ending up with more and more > domain names marked as being dysfunctional. Nearly 5% of the domain > names in the database were unreachable. The updated input database > should resolve this, but we might also see a jump in some results. It > will be interesting to see what the next run yields. > Why do we not update the input database more often? Because buried in > that database are the domain names of the people who wanted to opt out > over the years. Having never thought about this, I spent several hours > tracing back 5 years of emails of people complaining about the crawl > triggering their firewalls. I put together a blacklist of domain names > I have manually deleted from the crawl input files. > The blacklist, as it stands now: > > Deleted: > > it-mate.co.uk > indianic.com > your-server.de > catacombscds.com > dewlance.com > tcs.com > printweb.de > nocser.net > shoppingnsales.com > bsaadmail.com > epayservice.ru > 4footyfans.com > guitarspeed99.com > saga.co.uk > > Already gone from the current Alexa list: > > infinityautosurf.com > canada-traffic.com > usahitz.com > jawatankosong.com.my > 4d.com.my > fitnessuncovered.co.uk > kualalumpurbookfair.com > xgen-it.com > bpanet.de > edns.de > back2web.de > waaaouh.com > every-web.com > w3sexe.com > gratuits-web.com > france-mateur.com > pliagedepapier.com > immobilieretparticuliers.com > chronobio.com > stickers-origines.com > tailor-made.co.uk > > With these out of the input files, we are able to start the next crawl. * > I hope I have not missed any complaints, but if I have, this is > advance notice that we might receive a few emails in the forthcoming > weeks. We might also receive a few emails from sites that have > appeared on the Alexa 1 million list since 2010.* > > Back to this list, the excellent filtering program which was used to > process the original list and clean it up was used again for the > modern list. The Alexa list had a number of domain names which were > actually sub-directories in the past, as well as some invalid domains. > Alexa has since tightened its act. The latest Alexa list is much > cleaner. It holds 999998 valid domains vs. 984587 domains for the > original 2010 list. > > Finally, new gTLDs have now appeared in the Alexa list, including some > Internationalised Domain Names (IDNs). The world is indeed a very > different place! > It will be interesting to see how the Crawler as well as all other > scripts to process the information into displayable data on the Web > server, will cope with these: > > academy.csv consulting.csv guide.csv one.csv > supply.csv > accountant.csv contractors.csv guru.csv onl.csv > support.csv > actor.csv cool.csv hamburg.csv online.csv > surf.csv > ads.csv country.csv haus.csv ooo.csv > swiss.csv > adult.csv creditcard.csv healthcare.csv orange.csv > sydney.csv > agency.csv cricket.csv help.csv ovh.csv > systems.csv > alsace.csv cymru.csv hiphop.csv paris.csv > taipei.csv > amsterdam.csv dance.csv holiday.csv partners.csv > tattoo.csv > app.csv date.csv horse.csv parts.csv > team.csv > archi.csv dating.csv host.csv party.csv > tech.csv > associates.csv deals.csv hosting.csv photo.csv > technology.csv > attorney.csv delivery.csv house.csv photography.csv > theater.csv > auction.csv desi.csv how.csv photos.csv > tienda.csv > audio.csv design.csv immobilien.csv pics.csv > tips.csv > axa.csv dev.csv immo.csv pictures.csv > tirol.csv > barclaycard.csv diet.csv ink.csv pink.csv > today.csv > barclays.csv digital.csv international.csv pizza.csv > tokyo.csv > bar.csv direct.csv investments.csv place.csv > tools.csv > bargains.csv directory.csv irish.csv plus.csv > top.csv > bayern.csv discount.csv jetzt.csv poker.csv > town.csv > beer.csv dog.csv joburg.csv porn.csv > toys.csv > berlin.csv domains.csv juegos.csv post.csv > trade.csv > best.csv earth.csv kim.csv press.csv > training.csv > bid.csv education.csv kitchen.csv prod.csv > trust.csv > bike.csv email.csv kiwi.csv productions.csv > university.csv > bio.csv emerck.csv koeln.csv properties.csv > uno.csv > black.csv energy.csv krd.csv property.csv > uol.csv > blackfriday.csv equipment.csv kred.csv pub.csv > vacations.csv > blue.csv estate.csv land.csv quebec.csv > vegas.csv > bnpparibas.csv eus.csv law.csv realtor.csv > ventures.csv > boo.csv events.csv legal.csv recipes.csv > video.csv > boutique.csv exchange.csv life.csv red.csv > vision.csv > brussels.csv expert.csv limited.csv rehab.csv > voyage.csv > build.csv exposed.csv link.csv reise.csv > wales.csv > builders.csv express.csv live.csv reisen.csv > wang.csv > business.csv fail.csv lol.csv ren.csv > watch.csv > buzz.csv faith.csv london.csv rentals.csv > webcam.csv > bzh.csv farm.csv love.csv repair.csv > website.csv > cab.csv finance.csv luxury.csv report.csv > wien.csv > camera.csv fish.csv management.csv rest.csv > wiki.csv > camp.csv fishing.csv mango.csv review.csv > win.csv > capital.csv fit.csv market.csv reviews.csv > windows.csv > cards.csv fitness.csv marketing.csv rio.csv > work.csv > care.csv flights.csv markets.csv rip.csv > works.csv > career.csv foo.csv media.csv rocks.csv > world.csv > careers.csv football.csv melbourne.csv ruhr.csv > wtf.csv > casa.csv forsale.csv menu.csv ryukyu.csv > xn--3e0b707e.csv > cash.csv foundation.csv microsoft.csv sale.csv > xn--4gbrim.csv > casino.csv frl.csv moda.csv scb.csv > xn--80adxhks.csv > center.csv fund.csv moe.csv school.csv > xn--80asehdb.csv > ceo.csv futbol.csv monash.csv science.csv > xn--90ais.csv > chat.csv gal.csv money.csv scot.csv > xn--d1acj3b.csv > church.csv gallery.csv moscow.csv services.csv > xn--j1amh.csv > city.csv garden.csv movie.csv sexy.csv > xn--p1ai.csv > claims.csv gent.csv nagoya.csv shiksha.csv > xn--pgbs0dh.csv > click.csv gift.csv network.csv shoes.csv > xn--q9jyb4c.csv > clinic.csv gifts.csv new.csv singles.csv > xn--wgbl6a.csv > clothing.csv glass.csv news.csv site.csv > xxx.csv > club.csv global.csv nexus.csv social.csv > xyz.csv > coach.csv globo.csv ngo.csv software.csv > yandex.csv > codes.csv gmail.csv ninja.csv solar.csv > yoga.csv > coffee.csv goo.csv nrw.csv solutions.csv > yokohama.csv > college.csv goog.csv ntt.csv soy.csv > youtube.csv > community.csv google.csv nyc.csv space.csv > zone.csv > company.csv graphics.csv office.csv style.csv > computer.csv gratis.csv okinawa.csv sucks.csv > > In the meantime I'd like to cite again the Nile University Crew > expertly led by Sameh El Ansary for designing and coding a Crawler's > that been able to cope with shifting through 5 years of DNS junk with > minimal maintenance, save the love and attention I give the servers by > keeping them up to date with patches so they don't end up toppling > over. They haven't been rebooted in 464 days and I am crossing fingers > for their well-being. > And of course, thanks to the University of Southampton Crew who built > the excellent 2nd version of the Web Site under Tim Chown's supervision. > > I am still writing an article for RIPE Labs - just struggling to find > the time to finish it, but getting there. > > Warmest regards, > > Olivier > > > > _______________________________________________ > IPv6crawler-wg mailing list > IPv6crawler-wg at gih.co.uk > http://gypsy.gih.co.uk/mailman/listinfo/ipv6crawler-wg -- Olivier MJ Cr?pin-Leblond, PhD http://www.gih.com/ocl.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdel at firsthand.net Sat Feb 6 17:09:54 2016 From: cdel at firsthand.net (Christian de Larrinaga) Date: Sat, 06 Feb 2016 17:09:54 +0000 Subject: [IPv6crawler-wg] An important update about the IPv6 Matrix Project In-Reply-To: <56B62022.7060009@gih.com> References: <56575051.3070402@gih.com> <56B62022.7060009@gih.com> Message-ID: <56B628E2.5090601@firsthand.net> That is a humungous large sqlite database! or are you only collecting the data as a form of cache using sqlite and then exporting it out once organised into csv? Sqlite v3 supports utf-8 which might help? if it doesn't break something else of course. C Olivier MJ Crepin-Leblond wrote: > Hello all, > > another update: the first complete run using the new TLDs has completed! > You can view the results up to February 2016 from > http://www.ipv6matrix.org > > In adding new gTLDs we have hit a snag, although this snag does not > significantly affect overall results since it appears to only affect a > tiny number of domains. > > I am speaking about Internationalized Top Level Domains (IDNs): > > xn--3e0b707e xn--80adxhks xn--90ais xn--j1amh xn--pgbs0dh > xn--wgbl6a > xn--4gbrim xn--80asehdb xn--d1acj3b xn--p1ai xn--q9jyb4c > > Each of these is the ASCII equivalent of a non ASCII domain name. > Whist the Crawler works well with them and we are able to collect all > of the data pertaining to crawls in IDNs, the program that builds the > Database uses SQLite. Until now, database entries made use of domain > names that were ASCII - but IDNs use a double dash "--" in the domain. > SQLite coughs on DASH - so we have not been able to produce the > database needed for the displaying of the results when including IDNs. > > Until we have a workaround, I have manually isolated data collected > for IDNs, which means we still collect them, but we will not take them > into account in the final database results. As I have said, this is a > tiny subset of domains: 760 entries out of a total of 1 Million domains. > > I am *still* drafting a very long article for RIPE labs. In fact, we > might publish this in two parts. In the meantime, the results appear > to be somehow consistent with results of other tracking projects, some > of which use other methods to track IPv6 adoption: > > - http://6lab.cisco.com/stats/ > - https://www.vyncke.org/ipv6status/ > - http://www.mrp.net/ipv6_survey/ > > We now have 306 Gb of comma separated value text data in store, > tracing back the spread of the IPv6 Internet since July 2010. (294Gb > in November 2015) > > I look forward to your kind feedback. > > Warmest regards, > Olivier > > > On 26/11/2015 19:32, Olivier MJ Crepin-Leblond wrote: >> Hello all, >> >> Two worthy pieces of news regarding the IPv6 Matrix Project ( >> http://www.ipv6matrix.org ): >> >> 1. I have updated the Web site with the latest results ending in late >> October - hence noting a Crawl display date of November 2015. >> We now have 294 Gb of comma separated value text data in store, >> tracing back the spread of the IPv6 Internet since July 2010. >> Altogether, we ran the text approximately 36 times on all 1 million >> Alexa busiest Domain names. This represented testing of about 6.5 >> million hosts, carefully collecting traceroute information for each >> and every of them. We now have a very unique database that is showing >> the spread of the IPv6 Internet information sources worldwide. >> >> 2. Today I took out my very dusty Linux & Python gloves and performed >> a much needed update to the IPv6 Matrix Crawler input database, >> including the Alexa 1 million list as well as GeoIP Databases. >> >> Indeed, the Alexa database of the world's 1 million busiest Web sites >> dated from the Crawler's first inception in the first half of 2010. >> We're more than 5 years later! >> >> In a way, keeping the same input database has kept the base of crawls >> the steady thus the ability to compare results was possible. However, >> the flip-side of the coin is that we are ending up with more and more >> domain names marked as being dysfunctional. Nearly 5% of the domain >> names in the database were unreachable. The updated input database >> should resolve this, but we might also see a jump in some results. It >> will be interesting to see what the next run yields. >> Why do we not update the input database more often? Because buried in >> that database are the domain names of the people who wanted to opt >> out over the years. Having never thought about this, I spent several >> hours tracing back 5 years of emails of people complaining about the >> crawl triggering their firewalls. I put together a blacklist of >> domain names I have manually deleted from the crawl input files. >> The blacklist, as it stands now: >> >> Deleted: >> >> it-mate.co.uk >> indianic.com >> your-server.de >> catacombscds.com >> dewlance.com >> tcs.com >> printweb.de >> nocser.net >> shoppingnsales.com >> bsaadmail.com >> epayservice.ru >> 4footyfans.com >> guitarspeed99.com >> saga.co.uk >> >> Already gone from the current Alexa list: >> >> infinityautosurf.com >> canada-traffic.com >> usahitz.com >> jawatankosong.com.my >> 4d.com.my >> fitnessuncovered.co.uk >> kualalumpurbookfair.com >> xgen-it.com >> bpanet.de >> edns.de >> back2web.de >> waaaouh.com >> every-web.com >> w3sexe.com >> gratuits-web.com >> france-mateur.com >> pliagedepapier.com >> immobilieretparticuliers.com >> chronobio.com >> stickers-origines.com >> tailor-made.co.uk >> >> With these out of the input files, we are able to start the next crawl. * >> I hope I have not missed any complaints, but if I have, this is >> advance notice that we might receive a few emails in the forthcoming >> weeks. We might also receive a few emails from sites that have >> appeared on the Alexa 1 million list since 2010.* >> >> Back to this list, the excellent filtering program which was used to >> process the original list and clean it up was used again for the >> modern list. The Alexa list had a number of domain names which were >> actually sub-directories in the past, as well as some invalid >> domains. Alexa has since tightened its act. The latest Alexa list is >> much cleaner. It holds 999998 valid domains vs. 984587 domains for >> the original 2010 list. >> >> Finally, new gTLDs have now appeared in the Alexa list, including >> some Internationalised Domain Names (IDNs). The world is indeed a >> very different place! >> It will be interesting to see how the Crawler as well as all other >> scripts to process the information into displayable data on the Web >> server, will cope with these: >> >> academy.csv consulting.csv guide.csv one.csv >> supply.csv >> accountant.csv contractors.csv guru.csv onl.csv >> support.csv >> actor.csv cool.csv hamburg.csv online.csv >> surf.csv >> ads.csv country.csv haus.csv ooo.csv >> swiss.csv >> adult.csv creditcard.csv healthcare.csv orange.csv >> sydney.csv >> agency.csv cricket.csv help.csv ovh.csv >> systems.csv >> alsace.csv cymru.csv hiphop.csv paris.csv >> taipei.csv >> amsterdam.csv dance.csv holiday.csv partners.csv >> tattoo.csv >> app.csv date.csv horse.csv parts.csv >> team.csv >> archi.csv dating.csv host.csv party.csv >> tech.csv >> associates.csv deals.csv hosting.csv photo.csv >> technology.csv >> attorney.csv delivery.csv house.csv photography.csv >> theater.csv >> auction.csv desi.csv how.csv photos.csv >> tienda.csv >> audio.csv design.csv immobilien.csv pics.csv >> tips.csv >> axa.csv dev.csv immo.csv pictures.csv >> tirol.csv >> barclaycard.csv diet.csv ink.csv pink.csv >> today.csv >> barclays.csv digital.csv international.csv pizza.csv >> tokyo.csv >> bar.csv direct.csv investments.csv place.csv >> tools.csv >> bargains.csv directory.csv irish.csv plus.csv >> top.csv >> bayern.csv discount.csv jetzt.csv poker.csv >> town.csv >> beer.csv dog.csv joburg.csv porn.csv >> toys.csv >> berlin.csv domains.csv juegos.csv post.csv >> trade.csv >> best.csv earth.csv kim.csv press.csv >> training.csv >> bid.csv education.csv kitchen.csv prod.csv >> trust.csv >> bike.csv email.csv kiwi.csv productions.csv >> university.csv >> bio.csv emerck.csv koeln.csv properties.csv >> uno.csv >> black.csv energy.csv krd.csv property.csv >> uol.csv >> blackfriday.csv equipment.csv kred.csv pub.csv >> vacations.csv >> blue.csv estate.csv land.csv quebec.csv >> vegas.csv >> bnpparibas.csv eus.csv law.csv realtor.csv >> ventures.csv >> boo.csv events.csv legal.csv recipes.csv >> video.csv >> boutique.csv exchange.csv life.csv red.csv >> vision.csv >> brussels.csv expert.csv limited.csv rehab.csv >> voyage.csv >> build.csv exposed.csv link.csv reise.csv >> wales.csv >> builders.csv express.csv live.csv reisen.csv >> wang.csv >> business.csv fail.csv lol.csv ren.csv >> watch.csv >> buzz.csv faith.csv london.csv rentals.csv >> webcam.csv >> bzh.csv farm.csv love.csv repair.csv >> website.csv >> cab.csv finance.csv luxury.csv report.csv >> wien.csv >> camera.csv fish.csv management.csv rest.csv >> wiki.csv >> camp.csv fishing.csv mango.csv review.csv >> win.csv >> capital.csv fit.csv market.csv reviews.csv >> windows.csv >> cards.csv fitness.csv marketing.csv rio.csv >> work.csv >> care.csv flights.csv markets.csv rip.csv >> works.csv >> career.csv foo.csv media.csv rocks.csv >> world.csv >> careers.csv football.csv melbourne.csv ruhr.csv >> wtf.csv >> casa.csv forsale.csv menu.csv ryukyu.csv >> xn--3e0b707e.csv >> cash.csv foundation.csv microsoft.csv sale.csv >> xn--4gbrim.csv >> casino.csv frl.csv moda.csv scb.csv >> xn--80adxhks.csv >> center.csv fund.csv moe.csv school.csv >> xn--80asehdb.csv >> ceo.csv futbol.csv monash.csv science.csv >> xn--90ais.csv >> chat.csv gal.csv money.csv scot.csv >> xn--d1acj3b.csv >> church.csv gallery.csv moscow.csv services.csv >> xn--j1amh.csv >> city.csv garden.csv movie.csv sexy.csv >> xn--p1ai.csv >> claims.csv gent.csv nagoya.csv shiksha.csv >> xn--pgbs0dh.csv >> click.csv gift.csv network.csv shoes.csv >> xn--q9jyb4c.csv >> clinic.csv gifts.csv new.csv singles.csv >> xn--wgbl6a.csv >> clothing.csv glass.csv news.csv site.csv >> xxx.csv >> club.csv global.csv nexus.csv social.csv >> xyz.csv >> coach.csv globo.csv ngo.csv software.csv >> yandex.csv >> codes.csv gmail.csv ninja.csv solar.csv >> yoga.csv >> coffee.csv goo.csv nrw.csv solutions.csv >> yokohama.csv >> college.csv goog.csv ntt.csv soy.csv >> youtube.csv >> community.csv google.csv nyc.csv space.csv >> zone.csv >> company.csv graphics.csv office.csv style.csv >> computer.csv gratis.csv okinawa.csv sucks.csv >> >> In the meantime I'd like to cite again the Nile University Crew >> expertly led by Sameh El Ansary for designing and coding a Crawler's >> that been able to cope with shifting through 5 years of DNS junk with >> minimal maintenance, save the love and attention I give the servers >> by keeping them up to date with patches so they don't end up toppling >> over. They haven't been rebooted in 464 days and I am crossing >> fingers for their well-being. >> And of course, thanks to the University of Southampton Crew who built >> the excellent 2nd version of the Web Site under Tim Chown's supervision. >> >> I am still writing an article for RIPE Labs - just struggling to find >> the time to finish it, but getting there. >> >> Warmest regards, >> >> Olivier >> >> >> >> _______________________________________________ >> IPv6crawler-wg mailing list >> IPv6crawler-wg at gih.co.uk >> http://gypsy.gih.co.uk/mailman/listinfo/ipv6crawler-wg > > -- > Olivier MJ Cr?pin-Leblond, PhD > http://www.gih.com/ocl.html -- Christian de Larrinaga FBCS, CITP, ------------------------- @ FirstHand ------------------------- +44 7989 386778 cdel at firsthand.net ------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From ocl at gih.com Sat Feb 6 17:23:00 2016 From: ocl at gih.com (Olivier MJ Crepin-Leblond) Date: Sat, 6 Feb 2016 18:23:00 +0100 Subject: [IPv6crawler-wg] An important update about the IPv6 Matrix Project In-Reply-To: <56B628E2.5090601@firsthand.net> References: <56575051.3070402@gih.com> <56B62022.7060009@gih.com> <56B628E2.5090601@firsthand.net> Message-ID: <56B62BF4.8050901@gih.com> Hello Christian, the the sqlite database comes in when it comes down to displaying the results. The results of the crawls are in native CSV. All 306Gb of these. The Sqlite database is much smaller as it only uses a subset of all data collected (the data which is used in the GUI) and we are not using a single Sqlite database but one for each crawl - a summary of each crawl for each TLD. The question of Sqlite v3 is a good one -- and I have unfortunately got no idea whether it would work or whether it would break things. To be added to the list of things to do. Kindest regards, Olivier On 06/02/2016 18:09, Christian de Larrinaga wrote: > That is a humungous large sqlite database! or are you only collecting > the data as a form of cache using sqlite and then exporting it out > once organised into csv? > > Sqlite v3 supports utf-8 which might help? > if it doesn't break something else of course. > > C > > Olivier MJ Crepin-Leblond wrote: >> Hello all, >> >> another update: the first complete run using the new TLDs has completed! >> You can view the results up to February 2016 from >> http://www.ipv6matrix.org >> >> In adding new gTLDs we have hit a snag, although this snag does not >> significantly affect overall results since it appears to only affect >> a tiny number of domains. >> >> I am speaking about Internationalized Top Level Domains (IDNs): >> >> xn--3e0b707e xn--80adxhks xn--90ais xn--j1amh xn--pgbs0dh >> xn--wgbl6a >> xn--4gbrim xn--80asehdb xn--d1acj3b xn--p1ai xn--q9jyb4c >> >> Each of these is the ASCII equivalent of a non ASCII domain name. >> Whist the Crawler works well with them and we are able to collect all >> of the data pertaining to crawls in IDNs, the program that builds the >> Database uses SQLite. Until now, database entries made use of domain >> names that were ASCII - but IDNs use a double dash "--" in the >> domain. SQLite coughs on DASH - so we have not been able to produce >> the database needed for the displaying of the results when including >> IDNs. >> >> Until we have a workaround, I have manually isolated data collected >> for IDNs, which means we still collect them, but we will not take >> them into account in the final database results. As I have said, this >> is a tiny subset of domains: 760 entries out of a total of 1 Million >> domains. >> >> I am *still* drafting a very long article for RIPE labs. In fact, we >> might publish this in two parts. In the meantime, the results appear >> to be somehow consistent with results of other tracking projects, >> some of which use other methods to track IPv6 adoption: >> >> - http://6lab.cisco.com/stats/ >> - https://www.vyncke.org/ipv6status/ >> - http://www.mrp.net/ipv6_survey/ >> >> We now have 306 Gb of comma separated value text data in store, >> tracing back the spread of the IPv6 Internet since July 2010. (294Gb >> in November 2015) >> >> I look forward to your kind feedback. >> >> Warmest regards, >> Olivier >> >> >> On 26/11/2015 19:32, Olivier MJ Crepin-Leblond wrote: >>> Hello all, >>> >>> Two worthy pieces of news regarding the IPv6 Matrix Project ( >>> http://www.ipv6matrix.org ): >>> >>> 1. I have updated the Web site with the latest results ending in >>> late October - hence noting a Crawl display date of November 2015. >>> We now have 294 Gb of comma separated value text data in store, >>> tracing back the spread of the IPv6 Internet since July 2010. >>> Altogether, we ran the text approximately 36 times on all 1 million >>> Alexa busiest Domain names. This represented testing of about 6.5 >>> million hosts, carefully collecting traceroute information for each >>> and every of them. We now have a very unique database that is >>> showing the spread of the IPv6 Internet information sources worldwide. >>> >>> 2. Today I took out my very dusty Linux & Python gloves and >>> performed a much needed update to the IPv6 Matrix Crawler input >>> database, including the Alexa 1 million list as well as GeoIP Databases. >>> >>> Indeed, the Alexa database of the world's 1 million busiest Web >>> sites dated from the Crawler's first inception in the first half of >>> 2010. >>> We're more than 5 years later! >>> >>> In a way, keeping the same input database has kept the base of >>> crawls the steady thus the ability to compare results was possible. >>> However, the flip-side of the coin is that we are ending up with >>> more and more domain names marked as being dysfunctional. Nearly 5% >>> of the domain names in the database were unreachable. The updated >>> input database should resolve this, but we might also see a jump in >>> some results. It will be interesting to see what the next run yields. >>> Why do we not update the input database more often? Because buried >>> in that database are the domain names of the people who wanted to >>> opt out over the years. Having never thought about this, I spent >>> several hours tracing back 5 years of emails of people complaining >>> about the crawl triggering their firewalls. I put together a >>> blacklist of domain names I have manually deleted from the crawl >>> input files. >>> The blacklist, as it stands now: >>> >>> Deleted: >>> >>> it-mate.co.uk >>> indianic.com >>> your-server.de >>> catacombscds.com >>> dewlance.com >>> tcs.com >>> printweb.de >>> nocser.net >>> shoppingnsales.com >>> bsaadmail.com >>> epayservice.ru >>> 4footyfans.com >>> guitarspeed99.com >>> saga.co.uk >>> >>> Already gone from the current Alexa list: >>> >>> infinityautosurf.com >>> canada-traffic.com >>> usahitz.com >>> jawatankosong.com.my >>> 4d.com.my >>> fitnessuncovered.co.uk >>> kualalumpurbookfair.com >>> xgen-it.com >>> bpanet.de >>> edns.de >>> back2web.de >>> waaaouh.com >>> every-web.com >>> w3sexe.com >>> gratuits-web.com >>> france-mateur.com >>> pliagedepapier.com >>> immobilieretparticuliers.com >>> chronobio.com >>> stickers-origines.com >>> tailor-made.co.uk >>> >>> With these out of the input files, we are able to start the next >>> crawl. * >>> I hope I have not missed any complaints, but if I have, this is >>> advance notice that we might receive a few emails in the forthcoming >>> weeks. We might also receive a few emails from sites that have >>> appeared on the Alexa 1 million list since 2010.* >>> >>> Back to this list, the excellent filtering program which was used to >>> process the original list and clean it up was used again for the >>> modern list. The Alexa list had a number of domain names which were >>> actually sub-directories in the past, as well as some invalid >>> domains. Alexa has since tightened its act. The latest Alexa list is >>> much cleaner. It holds 999998 valid domains vs. 984587 domains for >>> the original 2010 list. >>> >>> Finally, new gTLDs have now appeared in the Alexa list, including >>> some Internationalised Domain Names (IDNs). The world is indeed a >>> very different place! >>> It will be interesting to see how the Crawler as well as all other >>> scripts to process the information into displayable data on the Web >>> server, will cope with these: >>> >>> academy.csv consulting.csv guide.csv >>> one.csv supply.csv >>> accountant.csv contractors.csv guru.csv >>> onl.csv support.csv >>> actor.csv cool.csv hamburg.csv >>> online.csv surf.csv >>> ads.csv country.csv haus.csv >>> ooo.csv swiss.csv >>> adult.csv creditcard.csv healthcare.csv >>> orange.csv sydney.csv >>> agency.csv cricket.csv help.csv >>> ovh.csv systems.csv >>> alsace.csv cymru.csv hiphop.csv >>> paris.csv taipei.csv >>> amsterdam.csv dance.csv holiday.csv >>> partners.csv tattoo.csv >>> app.csv date.csv horse.csv >>> parts.csv team.csv >>> archi.csv dating.csv host.csv >>> party.csv tech.csv >>> associates.csv deals.csv hosting.csv >>> photo.csv technology.csv >>> attorney.csv delivery.csv house.csv >>> photography.csv theater.csv >>> auction.csv desi.csv how.csv >>> photos.csv tienda.csv >>> audio.csv design.csv immobilien.csv >>> pics.csv tips.csv >>> axa.csv dev.csv immo.csv >>> pictures.csv tirol.csv >>> barclaycard.csv diet.csv ink.csv >>> pink.csv today.csv >>> barclays.csv digital.csv international.csv >>> pizza.csv tokyo.csv >>> bar.csv direct.csv investments.csv >>> place.csv tools.csv >>> bargains.csv directory.csv irish.csv >>> plus.csv top.csv >>> bayern.csv discount.csv jetzt.csv >>> poker.csv town.csv >>> beer.csv dog.csv joburg.csv >>> porn.csv toys.csv >>> berlin.csv domains.csv juegos.csv >>> post.csv trade.csv >>> best.csv earth.csv kim.csv >>> press.csv training.csv >>> bid.csv education.csv kitchen.csv >>> prod.csv trust.csv >>> bike.csv email.csv kiwi.csv >>> productions.csv university.csv >>> bio.csv emerck.csv koeln.csv >>> properties.csv uno.csv >>> black.csv energy.csv krd.csv >>> property.csv uol.csv >>> blackfriday.csv equipment.csv kred.csv >>> pub.csv vacations.csv >>> blue.csv estate.csv land.csv >>> quebec.csv vegas.csv >>> bnpparibas.csv eus.csv law.csv >>> realtor.csv ventures.csv >>> boo.csv events.csv legal.csv >>> recipes.csv video.csv >>> boutique.csv exchange.csv life.csv >>> red.csv vision.csv >>> brussels.csv expert.csv limited.csv >>> rehab.csv voyage.csv >>> build.csv exposed.csv link.csv >>> reise.csv wales.csv >>> builders.csv express.csv live.csv >>> reisen.csv wang.csv >>> business.csv fail.csv lol.csv >>> ren.csv watch.csv >>> buzz.csv faith.csv london.csv >>> rentals.csv webcam.csv >>> bzh.csv farm.csv love.csv >>> repair.csv website.csv >>> cab.csv finance.csv luxury.csv >>> report.csv wien.csv >>> camera.csv fish.csv management.csv >>> rest.csv wiki.csv >>> camp.csv fishing.csv mango.csv >>> review.csv win.csv >>> capital.csv fit.csv market.csv >>> reviews.csv windows.csv >>> cards.csv fitness.csv marketing.csv >>> rio.csv work.csv >>> care.csv flights.csv markets.csv >>> rip.csv works.csv >>> career.csv foo.csv media.csv >>> rocks.csv world.csv >>> careers.csv football.csv melbourne.csv >>> ruhr.csv wtf.csv >>> casa.csv forsale.csv menu.csv >>> ryukyu.csv xn--3e0b707e.csv >>> cash.csv foundation.csv microsoft.csv >>> sale.csv xn--4gbrim.csv >>> casino.csv frl.csv moda.csv >>> scb.csv xn--80adxhks.csv >>> center.csv fund.csv moe.csv >>> school.csv xn--80asehdb.csv >>> ceo.csv futbol.csv monash.csv >>> science.csv xn--90ais.csv >>> chat.csv gal.csv money.csv >>> scot.csv xn--d1acj3b.csv >>> church.csv gallery.csv moscow.csv >>> services.csv xn--j1amh.csv >>> city.csv garden.csv movie.csv >>> sexy.csv xn--p1ai.csv >>> claims.csv gent.csv nagoya.csv >>> shiksha.csv xn--pgbs0dh.csv >>> click.csv gift.csv network.csv >>> shoes.csv xn--q9jyb4c.csv >>> clinic.csv gifts.csv new.csv >>> singles.csv xn--wgbl6a.csv >>> clothing.csv glass.csv news.csv >>> site.csv xxx.csv >>> club.csv global.csv nexus.csv >>> social.csv xyz.csv >>> coach.csv globo.csv ngo.csv >>> software.csv yandex.csv >>> codes.csv gmail.csv ninja.csv >>> solar.csv yoga.csv >>> coffee.csv goo.csv nrw.csv >>> solutions.csv yokohama.csv >>> college.csv goog.csv ntt.csv >>> soy.csv youtube.csv >>> community.csv google.csv nyc.csv >>> space.csv zone.csv >>> company.csv graphics.csv office.csv style.csv >>> computer.csv gratis.csv okinawa.csv sucks.csv >>> >>> In the meantime I'd like to cite again the Nile University Crew >>> expertly led by Sameh El Ansary for designing and coding a Crawler's >>> that been able to cope with shifting through 5 years of DNS junk >>> with minimal maintenance, save the love and attention I give the >>> servers by keeping them up to date with patches so they don't end up >>> toppling over. They haven't been rebooted in 464 days and I am >>> crossing fingers for their well-being. >>> And of course, thanks to the University of Southampton Crew who >>> built the excellent 2nd version of the Web Site under Tim Chown's >>> supervision. >>> >>> I am still writing an article for RIPE Labs - just struggling to >>> find the time to finish it, but getting there. >>> >>> Warmest regards, >>> >>> Olivier >>> >>> >>> >>> _______________________________________________ >>> IPv6crawler-wg mailing list >>> IPv6crawler-wg at gih.co.uk >>> http://gypsy.gih.co.uk/mailman/listinfo/ipv6crawler-wg >> >> -- >> Olivier MJ Cr?pin-Leblond, PhD >> http://www.gih.com/ocl.html > > -- > Christian de Larrinaga FBCS, CITP, > ------------------------- > @ FirstHand > ------------------------- > +44 7989 386778 > cdel at firsthand.net > ------------------------- > -- Olivier MJ Cr?pin-Leblond, PhD http://www.gih.com/ocl.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdel at firsthand.net Sat Feb 6 17:53:47 2016 From: cdel at firsthand.net (Christian de Larrinaga) Date: Sat, 06 Feb 2016 17:53:47 +0000 Subject: [IPv6crawler-wg] An important update about the IPv6 Matrix Project In-Reply-To: <56B62BF4.8050901@gih.com> References: <56575051.3070402@gih.com> <56B62022.7060009@gih.com> <56B628E2.5090601@firsthand.net> <56B62BF4.8050901@gih.com> Message-ID: <56B6332B.3090503@firsthand.net> ahh I know that list ;-) Olivier MJ Crepin-Leblond wrote: > Hello Christian, > > the the sqlite database comes in when it comes down to displaying the > results. The results of the crawls are in native CSV. All 306Gb of > these. The Sqlite database is much smaller as it only uses a subset of > all data collected (the data which is used in the GUI) and we are not > using a single Sqlite database but one for each crawl - a summary of > each crawl for each TLD. > The question of Sqlite v3 is a good one -- and I have unfortunately > got no idea whether it would work or whether it would break things. To > be added to the list of things to do. > Kindest regards, > > Olivier > > On 06/02/2016 18:09, Christian de Larrinaga wrote: >> That is a humungous large sqlite database! or are you only collecting >> the data as a form of cache using sqlite and then exporting it out >> once organised into csv? >> >> Sqlite v3 supports utf-8 which might help? >> if it doesn't break something else of course. >> >> C >> >> Olivier MJ Crepin-Leblond wrote: >>> Hello all, >>> >>> another update: the first complete run using the new TLDs has >>> completed! >>> You can view the results up to February 2016 from >>> http://www.ipv6matrix.org >>> >>> In adding new gTLDs we have hit a snag, although this snag does not >>> significantly affect overall results since it appears to only affect >>> a tiny number of domains. >>> >>> I am speaking about Internationalized Top Level Domains (IDNs): >>> >>> xn--3e0b707e xn--80adxhks xn--90ais xn--j1amh xn--pgbs0dh >>> xn--wgbl6a >>> xn--4gbrim xn--80asehdb xn--d1acj3b xn--p1ai xn--q9jyb4c >>> >>> Each of these is the ASCII equivalent of a non ASCII domain name. >>> Whist the Crawler works well with them and we are able to collect >>> all of the data pertaining to crawls in IDNs, the program that >>> builds the Database uses SQLite. Until now, database entries made >>> use of domain names that were ASCII - but IDNs use a double dash >>> "--" in the domain. SQLite coughs on DASH - so we have not been able >>> to produce the database needed for the displaying of the results >>> when including IDNs. >>> >>> Until we have a workaround, I have manually isolated data collected >>> for IDNs, which means we still collect them, but we will not take >>> them into account in the final database results. As I have said, >>> this is a tiny subset of domains: 760 entries out of a total of 1 >>> Million domains. >>> >>> I am *still* drafting a very long article for RIPE labs. In fact, we >>> might publish this in two parts. In the meantime, the results appear >>> to be somehow consistent with results of other tracking projects, >>> some of which use other methods to track IPv6 adoption: >>> >>> - http://6lab.cisco.com/stats/ >>> - https://www.vyncke.org/ipv6status/ >>> - http://www.mrp.net/ipv6_survey/ >>> >>> We now have 306 Gb of comma separated value text data in store, >>> tracing back the spread of the IPv6 Internet since July 2010. >>> (294Gb in November 2015) >>> >>> I look forward to your kind feedback. >>> >>> Warmest regards, >>> Olivier >>> >>> >>> On 26/11/2015 19:32, Olivier MJ Crepin-Leblond wrote: >>>> Hello all, >>>> >>>> Two worthy pieces of news regarding the IPv6 Matrix Project ( >>>> http://www.ipv6matrix.org ): >>>> >>>> 1. I have updated the Web site with the latest results ending in >>>> late October - hence noting a Crawl display date of November 2015. >>>> We now have 294 Gb of comma separated value text data in store, >>>> tracing back the spread of the IPv6 Internet since July 2010. >>>> Altogether, we ran the text approximately 36 times on all 1 million >>>> Alexa busiest Domain names. This represented testing of about 6.5 >>>> million hosts, carefully collecting traceroute information for each >>>> and every of them. We now have a very unique database that is >>>> showing the spread of the IPv6 Internet information sources worldwide. >>>> >>>> 2. Today I took out my very dusty Linux & Python gloves and >>>> performed a much needed update to the IPv6 Matrix Crawler input >>>> database, including the Alexa 1 million list as well as GeoIP >>>> Databases. >>>> >>>> Indeed, the Alexa database of the world's 1 million busiest Web >>>> sites dated from the Crawler's first inception in the first half of >>>> 2010. >>>> We're more than 5 years later! >>>> >>>> In a way, keeping the same input database has kept the base of >>>> crawls the steady thus the ability to compare results was possible. >>>> However, the flip-side of the coin is that we are ending up with >>>> more and more domain names marked as being dysfunctional. Nearly 5% >>>> of the domain names in the database were unreachable. The updated >>>> input database should resolve this, but we might also see a jump in >>>> some results. It will be interesting to see what the next run yields. >>>> Why do we not update the input database more often? Because buried >>>> in that database are the domain names of the people who wanted to >>>> opt out over the years. Having never thought about this, I spent >>>> several hours tracing back 5 years of emails of people complaining >>>> about the crawl triggering their firewalls. I put together a >>>> blacklist of domain names I have manually deleted from the crawl >>>> input files. >>>> The blacklist, as it stands now: >>>> >>>> Deleted: >>>> >>>> it-mate.co.uk >>>> indianic.com >>>> your-server.de >>>> catacombscds.com >>>> dewlance.com >>>> tcs.com >>>> printweb.de >>>> nocser.net >>>> shoppingnsales.com >>>> bsaadmail.com >>>> epayservice.ru >>>> 4footyfans.com >>>> guitarspeed99.com >>>> saga.co.uk >>>> >>>> Already gone from the current Alexa list: >>>> >>>> infinityautosurf.com >>>> canada-traffic.com >>>> usahitz.com >>>> jawatankosong.com.my >>>> 4d.com.my >>>> fitnessuncovered.co.uk >>>> kualalumpurbookfair.com >>>> xgen-it.com >>>> bpanet.de >>>> edns.de >>>> back2web.de >>>> waaaouh.com >>>> every-web.com >>>> w3sexe.com >>>> gratuits-web.com >>>> france-mateur.com >>>> pliagedepapier.com >>>> immobilieretparticuliers.com >>>> chronobio.com >>>> stickers-origines.com >>>> tailor-made.co.uk >>>> >>>> With these out of the input files, we are able to start the next >>>> crawl. * >>>> I hope I have not missed any complaints, but if I have, this is >>>> advance notice that we might receive a few emails in the >>>> forthcoming weeks. We might also receive a few emails from sites >>>> that have appeared on the Alexa 1 million list since 2010.* >>>> >>>> Back to this list, the excellent filtering program which was used >>>> to process the original list and clean it up was used again for the >>>> modern list. The Alexa list had a number of domain names which were >>>> actually sub-directories in the past, as well as some invalid >>>> domains. Alexa has since tightened its act. The latest Alexa list >>>> is much cleaner. It holds 999998 valid domains vs. 984587 domains >>>> for the original 2010 list. >>>> >>>> Finally, new gTLDs have now appeared in the Alexa list, including >>>> some Internationalised Domain Names (IDNs). The world is indeed a >>>> very different place! >>>> It will be interesting to see how the Crawler as well as all other >>>> scripts to process the information into displayable data on the Web >>>> server, will cope with these: >>>> >>>> academy.csv consulting.csv guide.csv >>>> one.csv supply.csv >>>> accountant.csv contractors.csv guru.csv >>>> onl.csv support.csv >>>> actor.csv cool.csv hamburg.csv >>>> online.csv surf.csv >>>> ads.csv country.csv haus.csv >>>> ooo.csv swiss.csv >>>> adult.csv creditcard.csv healthcare.csv >>>> orange.csv sydney.csv >>>> agency.csv cricket.csv help.csv >>>> ovh.csv systems.csv >>>> alsace.csv cymru.csv hiphop.csv >>>> paris.csv taipei.csv >>>> amsterdam.csv dance.csv holiday.csv >>>> partners.csv tattoo.csv >>>> app.csv date.csv horse.csv >>>> parts.csv team.csv >>>> archi.csv dating.csv host.csv >>>> party.csv tech.csv >>>> associates.csv deals.csv hosting.csv >>>> photo.csv technology.csv >>>> attorney.csv delivery.csv house.csv >>>> photography.csv theater.csv >>>> auction.csv desi.csv how.csv >>>> photos.csv tienda.csv >>>> audio.csv design.csv immobilien.csv >>>> pics.csv tips.csv >>>> axa.csv dev.csv immo.csv >>>> pictures.csv tirol.csv >>>> barclaycard.csv diet.csv ink.csv >>>> pink.csv today.csv >>>> barclays.csv digital.csv international.csv >>>> pizza.csv tokyo.csv >>>> bar.csv direct.csv investments.csv >>>> place.csv tools.csv >>>> bargains.csv directory.csv irish.csv >>>> plus.csv top.csv >>>> bayern.csv discount.csv jetzt.csv >>>> poker.csv town.csv >>>> beer.csv dog.csv joburg.csv >>>> porn.csv toys.csv >>>> berlin.csv domains.csv juegos.csv >>>> post.csv trade.csv >>>> best.csv earth.csv kim.csv >>>> press.csv training.csv >>>> bid.csv education.csv kitchen.csv >>>> prod.csv trust.csv >>>> bike.csv email.csv kiwi.csv >>>> productions.csv university.csv >>>> bio.csv emerck.csv koeln.csv >>>> properties.csv uno.csv >>>> black.csv energy.csv krd.csv >>>> property.csv uol.csv >>>> blackfriday.csv equipment.csv kred.csv >>>> pub.csv vacations.csv >>>> blue.csv estate.csv land.csv >>>> quebec.csv vegas.csv >>>> bnpparibas.csv eus.csv law.csv >>>> realtor.csv ventures.csv >>>> boo.csv events.csv legal.csv >>>> recipes.csv video.csv >>>> boutique.csv exchange.csv life.csv >>>> red.csv vision.csv >>>> brussels.csv expert.csv limited.csv >>>> rehab.csv voyage.csv >>>> build.csv exposed.csv link.csv >>>> reise.csv wales.csv >>>> builders.csv express.csv live.csv >>>> reisen.csv wang.csv >>>> business.csv fail.csv lol.csv >>>> ren.csv watch.csv >>>> buzz.csv faith.csv london.csv >>>> rentals.csv webcam.csv >>>> bzh.csv farm.csv love.csv >>>> repair.csv website.csv >>>> cab.csv finance.csv luxury.csv >>>> report.csv wien.csv >>>> camera.csv fish.csv management.csv >>>> rest.csv wiki.csv >>>> camp.csv fishing.csv mango.csv >>>> review.csv win.csv >>>> capital.csv fit.csv market.csv >>>> reviews.csv windows.csv >>>> cards.csv fitness.csv marketing.csv >>>> rio.csv work.csv >>>> care.csv flights.csv markets.csv >>>> rip.csv works.csv >>>> career.csv foo.csv media.csv >>>> rocks.csv world.csv >>>> careers.csv football.csv melbourne.csv >>>> ruhr.csv wtf.csv >>>> casa.csv forsale.csv menu.csv >>>> ryukyu.csv xn--3e0b707e.csv >>>> cash.csv foundation.csv microsoft.csv >>>> sale.csv xn--4gbrim.csv >>>> casino.csv frl.csv moda.csv >>>> scb.csv xn--80adxhks.csv >>>> center.csv fund.csv moe.csv >>>> school.csv xn--80asehdb.csv >>>> ceo.csv futbol.csv monash.csv >>>> science.csv xn--90ais.csv >>>> chat.csv gal.csv money.csv >>>> scot.csv xn--d1acj3b.csv >>>> church.csv gallery.csv moscow.csv >>>> services.csv xn--j1amh.csv >>>> city.csv garden.csv movie.csv >>>> sexy.csv xn--p1ai.csv >>>> claims.csv gent.csv nagoya.csv >>>> shiksha.csv xn--pgbs0dh.csv >>>> click.csv gift.csv network.csv >>>> shoes.csv xn--q9jyb4c.csv >>>> clinic.csv gifts.csv new.csv >>>> singles.csv xn--wgbl6a.csv >>>> clothing.csv glass.csv news.csv >>>> site.csv xxx.csv >>>> club.csv global.csv nexus.csv >>>> social.csv xyz.csv >>>> coach.csv globo.csv ngo.csv >>>> software.csv yandex.csv >>>> codes.csv gmail.csv ninja.csv >>>> solar.csv yoga.csv >>>> coffee.csv goo.csv nrw.csv >>>> solutions.csv yokohama.csv >>>> college.csv goog.csv ntt.csv >>>> soy.csv youtube.csv >>>> community.csv google.csv nyc.csv >>>> space.csv zone.csv >>>> company.csv graphics.csv office.csv style.csv >>>> computer.csv gratis.csv okinawa.csv sucks.csv >>>> >>>> In the meantime I'd like to cite again the Nile University Crew >>>> expertly led by Sameh El Ansary for designing and coding a >>>> Crawler's that been able to cope with shifting through 5 years of >>>> DNS junk with minimal maintenance, save the love and attention I >>>> give the servers by keeping them up to date with patches so they >>>> don't end up toppling over. They haven't been rebooted in 464 days >>>> and I am crossing fingers for their well-being. >>>> And of course, thanks to the University of Southampton Crew who >>>> built the excellent 2nd version of the Web Site under Tim Chown's >>>> supervision. >>>> >>>> I am still writing an article for RIPE Labs - just struggling to >>>> find the time to finish it, but getting there. >>>> >>>> Warmest regards, >>>> >>>> Olivier >>>> >>>> >>>> >>>> _______________________________________________ >>>> IPv6crawler-wg mailing list >>>> IPv6crawler-wg at gih.co.uk >>>> http://gypsy.gih.co.uk/mailman/listinfo/ipv6crawler-wg >>> >>> -- >>> Olivier MJ Cr?pin-Leblond, PhD >>> http://www.gih.com/ocl.html >> >> -- >> Christian de Larrinaga FBCS, CITP, >> ------------------------- >> @ FirstHand >> ------------------------- >> +44 7989 386778 >> cdel at firsthand.net >> ------------------------- >> > > -- > Olivier MJ Cr?pin-Leblond, PhD > http://www.gih.com/ocl.html -- Christian de Larrinaga FBCS, CITP, ------------------------- @ FirstHand ------------------------- +44 7989 386778 cdel at firsthand.net ------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjc at ecs.soton.ac.uk Sun Feb 7 18:44:29 2016 From: tjc at ecs.soton.ac.uk (Tim Chown) Date: Sun, 7 Feb 2016 18:44:29 +0000 Subject: [IPv6crawler-wg] An important update about the IPv6 Matrix Project In-Reply-To: <56B62BF4.8050901@gih.com> References: <56575051.3070402@gih.com> <56B62022.7060009@gih.com> <56B628E2.5090601@firsthand.net> <56B62BF4.8050901@gih.com> Message-ID: Hi, This is pretty cool, and the db is slowly but surely making its way into Big Data territory ;) The internationalised domain name problem is also interesting. Christian?s solution sounds good. It seems timely to have another push on both virtualising the system (so we can run it from other vantage points) and distributing the data / results to minimise any potential of any loss to the increasingly valuable data set. This might fit well with the growing web observatory activity in Southampton. We also have a new highly resilient data centre in Fareham which could be a good place for the virtualised copy to be hosted. If you?re OK with it, I can make some contacts to initiate, but let me know. Tim > On 6 Feb 2016, at 17:23, Olivier MJ Crepin-Leblond wrote: > > Hello Christian, > > the the sqlite database comes in when it comes down to displaying the results. The results of the crawls are in native CSV. All 306Gb of these. The Sqlite database is much smaller as it only uses a subset of all data collected (the data which is used in the GUI) and we are not using a single Sqlite database but one for each crawl - a summary of each crawl for each TLD. > The question of Sqlite v3 is a good one -- and I have unfortunately got no idea whether it would work or whether it would break things. To be added to the list of things to do. > Kindest regards, > > Olivier > > On 06/02/2016 18:09, Christian de Larrinaga wrote: >> That is a humungous large sqlite database! or are you only collecting the data as a form of cache using sqlite and then exporting it out once organised into csv? >> >> Sqlite v3 supports utf-8 which might help? >> if it doesn't break something else of course. >> >> C >> >> Olivier MJ Crepin-Leblond wrote: >>> Hello all, >>> >>> another update: the first complete run using the new TLDs has completed! >>> You can view the results up to February 2016 from http://www.ipv6matrix.org >>> >>> In adding new gTLDs we have hit a snag, although this snag does not significantly affect overall results since it appears to only affect a tiny number of domains. >>> >>> I am speaking about Internationalized Top Level Domains (IDNs): >>> >>> xn--3e0b707e xn--80adxhks xn--90ais xn--j1amh xn--pgbs0dh xn--wgbl6a >>> xn--4gbrim xn--80asehdb xn--d1acj3b xn--p1ai xn--q9jyb4c >>> >>> Each of these is the ASCII equivalent of a non ASCII domain name. Whist the Crawler works well with them and we are able to collect all of the data pertaining to crawls in IDNs, the program that builds the Database uses SQLite. Until now, database entries made use of domain names that were ASCII - but IDNs use a double dash "--" in the domain. SQLite coughs on DASH - so we have not been able to produce the database needed for the displaying of the results when including IDNs. >>> >>> Until we have a workaround, I have manually isolated data collected for IDNs, which means we still collect them, but we will not take them into account in the final database results. As I have said, this is a tiny subset of domains: 760 entries out of a total of 1 Million domains. >>> >>> I am *still* drafting a very long article for RIPE labs. In fact, we might publish this in two parts. In the meantime, the results appear to be somehow consistent with results of other tracking projects, some of which use other methods to track IPv6 adoption: >>> >>> - http://6lab.cisco.com/stats/ >>> - https://www.vyncke.org/ipv6status/ >>> - http://www.mrp.net/ipv6_survey/ >>> >>> We now have 306 Gb of comma separated value text data in store, tracing back the spread of the IPv6 Internet since July 2010. (294Gb in November 2015) >>> >>> I look forward to your kind feedback. >>> >>> Warmest regards, >>> Olivier >>> >>> >>> On 26/11/2015 19:32, Olivier MJ Crepin-Leblond wrote: >>>> Hello all, >>>> >>>> Two worthy pieces of news regarding the IPv6 Matrix Project ( http://www.ipv6matrix.org ): >>>> >>>> 1. I have updated the Web site with the latest results ending in late October - hence noting a Crawl display date of November 2015. >>>> We now have 294 Gb of comma separated value text data in store, tracing back the spread of the IPv6 Internet since July 2010. >>>> Altogether, we ran the text approximately 36 times on all 1 million Alexa busiest Domain names. This represented testing of about 6.5 million hosts, carefully collecting traceroute information for each and every of them. We now have a very unique database that is showing the spread of the IPv6 Internet information sources worldwide. >>>> >>>> 2. Today I took out my very dusty Linux & Python gloves and performed a much needed update to the IPv6 Matrix Crawler input database, including the Alexa 1 million list as well as GeoIP Databases. >>>> >>>> Indeed, the Alexa database of the world's 1 million busiest Web sites dated from the Crawler's first inception in the first half of 2010. >>>> We're more than 5 years later! >>>> >>>> In a way, keeping the same input database has kept the base of crawls the steady thus the ability to compare results was possible. However, the flip-side of the coin is that we are ending up with more and more domain names marked as being dysfunctional. Nearly 5% of the domain names in the database were unreachable. The updated input database should resolve this, but we might also see a jump in some results. It will be interesting to see what the next run yields. >>>> Why do we not update the input database more often? Because buried in that database are the domain names of the people who wanted to opt out over the years. Having never thought about this, I spent several hours tracing back 5 years of emails of people complaining about the crawl triggering their firewalls. I put together a blacklist of domain names I have manually deleted from the crawl input files. >>>> The blacklist, as it stands now: >>>> >>>> Deleted: >>>> >>>> it-mate.co.uk >>>> indianic.com >>>> your-server.de >>>> catacombscds.com >>>> dewlance.com >>>> tcs.com >>>> printweb.de >>>> nocser.net >>>> shoppingnsales.com >>>> bsaadmail.com >>>> epayservice.ru >>>> 4footyfans.com >>>> guitarspeed99.com >>>> saga.co.uk >>>> >>>> Already gone from the current Alexa list: >>>> >>>> infinityautosurf.com >>>> canada-traffic.com >>>> usahitz.com >>>> jawatankosong.com.my >>>> 4d.com.my >>>> fitnessuncovered.co.uk >>>> kualalumpurbookfair.com >>>> xgen-it.com >>>> bpanet.de >>>> edns.de >>>> back2web.de >>>> waaaouh.com >>>> every-web.com >>>> w3sexe.com >>>> gratuits-web.com >>>> france-mateur.com >>>> pliagedepapier.com >>>> immobilieretparticuliers.com >>>> chronobio.com >>>> stickers-origines.com >>>> tailor-made.co.uk >>>> >>>> With these out of the input files, we are able to start the next crawl. >>>> I hope I have not missed any complaints, but if I have, this is advance notice that we might receive a few emails in the forthcoming weeks. We might also receive a few emails from sites that have appeared on the Alexa 1 million list since 2010. >>>> >>>> Back to this list, the excellent filtering program which was used to process the original list and clean it up was used again for the modern list. The Alexa list had a number of domain names which were actually sub-directories in the past, as well as some invalid domains. Alexa has since tightened its act. The latest Alexa list is much cleaner. It holds 999998 valid domains vs. 984587 domains for the original 2010 list. >>>> >>>> Finally, new gTLDs have now appeared in the Alexa list, including some Internationalised Domain Names (IDNs). The world is indeed a very different place! >>>> It will be interesting to see how the Crawler as well as all other scripts to process the information into displayable data on the Web server, will cope with these: >>>> >>>> academy.csv consulting.csv guide.csv one.csv supply.csv >>>> accountant.csv contractors.csv guru.csv onl.csv support.csv >>>> actor.csv cool.csv hamburg.csv online.csv surf.csv >>>> ads.csv country.csv haus.csv ooo.csv swiss.csv >>>> adult.csv creditcard.csv healthcare.csv orange.csv sydney.csv >>>> agency.csv cricket.csv help.csv ovh.csv systems.csv >>>> alsace.csv cymru.csv hiphop.csv paris.csv taipei.csv >>>> amsterdam.csv dance.csv holiday.csv partners.csv tattoo.csv >>>> app.csv date.csv horse.csv parts.csv team.csv >>>> archi.csv dating.csv host.csv party.csv tech.csv >>>> associates.csv deals.csv hosting.csv photo.csv technology.csv >>>> attorney.csv delivery.csv house.csv photography.csv theater.csv >>>> auction.csv desi.csv how.csv photos.csv tienda.csv >>>> audio.csv design.csv immobilien.csv pics.csv tips.csv >>>> axa.csv dev.csv immo.csv pictures.csv tirol.csv >>>> barclaycard.csv diet.csv ink.csv pink.csv today.csv >>>> barclays.csv digital.csv international.csv pizza.csv tokyo.csv >>>> bar.csv direct.csv investments.csv place.csv tools.csv >>>> bargains.csv directory.csv irish.csv plus.csv top.csv >>>> bayern.csv discount.csv jetzt.csv poker.csv town.csv >>>> beer.csv dog.csv joburg.csv porn.csv toys.csv >>>> berlin.csv domains.csv juegos.csv post.csv trade.csv >>>> best.csv earth.csv kim.csv press.csv training.csv >>>> bid.csv education.csv kitchen.csv prod.csv trust.csv >>>> bike.csv email.csv kiwi.csv productions.csv university.csv >>>> bio.csv emerck.csv koeln.csv properties.csv uno.csv >>>> black.csv energy.csv krd.csv property.csv uol.csv >>>> blackfriday.csv equipment.csv kred.csv pub.csv vacations.csv >>>> blue.csv estate.csv land.csv quebec.csv vegas.csv >>>> bnpparibas.csv eus.csv law.csv realtor.csv ventures.csv >>>> boo.csv events.csv legal.csv recipes.csv video.csv >>>> boutique.csv exchange.csv life.csv red.csv vision.csv >>>> brussels.csv expert.csv limited.csv rehab.csv voyage.csv >>>> build.csv exposed.csv link.csv reise.csv wales.csv >>>> builders.csv express.csv live.csv reisen.csv wang.csv >>>> business.csv fail.csv lol.csv ren.csv watch.csv >>>> buzz.csv faith.csv london.csv rentals.csv webcam.csv >>>> bzh.csv farm.csv love.csv repair.csv website.csv >>>> cab.csv finance.csv luxury.csv report.csv wien.csv >>>> camera.csv fish.csv management.csv rest.csv wiki.csv >>>> camp.csv fishing.csv mango.csv review.csv win.csv >>>> capital.csv fit.csv market.csv reviews.csv windows.csv >>>> cards.csv fitness.csv marketing.csv rio.csv work.csv >>>> care.csv flights.csv markets.csv rip.csv works.csv >>>> career.csv foo.csv media.csv rocks.csv world.csv >>>> careers.csv football.csv melbourne.csv ruhr.csv wtf.csv >>>> casa.csv forsale.csv menu.csv ryukyu.csv xn--3e0b707e.csv >>>> cash.csv foundation.csv microsoft.csv sale.csv xn--4gbrim.csv >>>> casino.csv frl.csv moda.csv scb.csv xn--80adxhks.csv >>>> center.csv fund.csv moe.csv school.csv xn--80asehdb.csv >>>> ceo.csv futbol.csv monash.csv science.csv xn--90ais.csv >>>> chat.csv gal.csv money.csv scot.csv xn--d1acj3b.csv >>>> church.csv gallery.csv moscow.csv services.csv xn--j1amh.csv >>>> city.csv garden.csv movie.csv sexy.csv xn--p1ai.csv >>>> claims.csv gent.csv nagoya.csv shiksha.csv xn--pgbs0dh.csv >>>> click.csv gift.csv network.csv shoes.csv xn--q9jyb4c.csv >>>> clinic.csv gifts.csv new.csv singles.csv xn--wgbl6a.csv >>>> clothing.csv glass.csv news.csv site.csv xxx.csv >>>> club.csv global.csv nexus.csv social.csv xyz.csv >>>> coach.csv globo.csv ngo.csv software.csv yandex.csv >>>> codes.csv gmail.csv ninja.csv solar.csv yoga.csv >>>> coffee.csv goo.csv nrw.csv solutions.csv yokohama.csv >>>> college.csv goog.csv ntt.csv soy.csv youtube.csv >>>> community.csv google.csv nyc.csv space.csv zone.csv >>>> company.csv graphics.csv office.csv style.csv >>>> computer.csv gratis.csv okinawa.csv sucks.csv >>>> >>>> In the meantime I'd like to cite again the Nile University Crew expertly led by Sameh El Ansary for designing and coding a Crawler's that been able to cope with shifting through 5 years of DNS junk with minimal maintenance, save the love and attention I give the servers by keeping them up to date with patches so they don't end up toppling over. They haven't been rebooted in 464 days and I am crossing fingers for their well-being. >>>> And of course, thanks to the University of Southampton Crew who built the excellent 2nd version of the Web Site under Tim Chown's supervision. >>>> >>>> I am still writing an article for RIPE Labs - just struggling to find the time to finish it, but getting there. >>>> >>>> Warmest regards, >>>> >>>> Olivier >>>> >>>> >>>> >>>> _______________________________________________ >>>> IPv6crawler-wg mailing list >>>> IPv6crawler-wg at gih.co.uk >>>> http://gypsy.gih.co.uk/mailman/listinfo/ipv6crawler-wg >>> >>> -- >>> Olivier MJ Cr?pin-Leblond, PhD >>> http://www.gih.com/ocl.html >> >> -- >> Christian de Larrinaga FBCS, CITP, >> ------------------------- >> @ FirstHand >> ------------------------- >> +44 7989 386778 >> cdel at firsthand.net >> ------------------------- >> > > -- > Olivier MJ Cr?pin-Leblond, PhD > http://www.gih.com/ocl.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdel at firsthand.net Sun Feb 7 22:41:10 2016 From: cdel at firsthand.net (Christian de Larrinaga) Date: Sun, 07 Feb 2016 22:41:10 +0000 Subject: [IPv6crawler-wg] An important update about the IPv6 Matrix Project In-Reply-To: References: <56575051.3070402@gih.com> <56B62022.7060009@gih.com> <56B628E2.5090601@firsthand.net> <56B62BF4.8050901@gih.com> Message-ID: <56B7C806.5060902@firsthand.net> Actually looking into whether to formalise the Matrix as a web observatory is not a bad idea. Should I ask Thanassis or Wendy? Christian Tim Chown wrote: > Hi, > > This is pretty cool, and the db is slowly but surely making its way > into Big Data territory ;) > > The internationalised domain name problem is also interesting. > Christian?s solution sounds good. > > It seems timely to have another push on both virtualising the system > (so we can run it from other vantage points) and distributing the data > / results to minimise any potential of any loss to the increasingly > valuable data set. > > This might fit well with the growing web observatory activity in > Southampton. We also have a new highly resilient data centre in > Fareham which could be a good place for the virtualised copy to be > hosted. If you?re OK with it, I can make some contacts to initiate, > but let me know. > > Tim > > > >> On 6 Feb 2016, at 17:23, Olivier MJ Crepin-Leblond > > wrote: >> >> Hello Christian, >> >> the the sqlite database comes in when it comes down to displaying the >> results. The results of the crawls are in native CSV. All 306Gb of >> these. The Sqlite database is much smaller as it only uses a subset >> of all data collected (the data which is used in the GUI) and we are >> not using a single Sqlite database but one for each crawl - a summary >> of each crawl for each TLD. >> The question of Sqlite v3 is a good one -- and I have unfortunately >> got no idea whether it would work or whether it would break things. >> To be added to the list of things to do. >> Kindest regards, >> >> Olivier >> >> On 06/02/2016 18:09, Christian de Larrinaga wrote: >>> That is a humungous large sqlite database! or are you only >>> collecting the data as a form of cache using sqlite and then >>> exporting it out once organised into csv? >>> >>> Sqlite v3 supports utf-8 which might help? >>> if it doesn't break something else of course. >>> >>> C >>> >>> Olivier MJ Crepin-Leblond wrote: >>>> Hello all, >>>> >>>> another update: the first complete run using the new TLDs has >>>> completed! >>>> You can view the results up to February 2016 from >>>> http://www.ipv6matrix.org >>>> >>>> In adding new gTLDs we have hit a snag, although this snag does not >>>> significantly affect overall results since it appears to only >>>> affect a tiny number of domains. >>>> >>>> I am speaking about Internationalized Top Level Domains (IDNs): >>>> >>>> xn--3e0b707e xn--80adxhks xn--90ais xn--j1amh xn--pgbs0dh >>>> xn--wgbl6a >>>> xn--4gbrim xn--80asehdb xn--d1acj3b xn--p1ai xn--q9jyb4c >>>> >>>> Each of these is the ASCII equivalent of a non ASCII domain name. >>>> Whist the Crawler works well with them and we are able to collect >>>> all of the data pertaining to crawls in IDNs, the program that >>>> builds the Database uses SQLite. Until now, database entries made >>>> use of domain names that were ASCII - but IDNs use a double dash >>>> "--" in the domain. SQLite coughs on DASH - so we have not been >>>> able to produce the database needed for the displaying of the >>>> results when including IDNs. >>>> >>>> Until we have a workaround, I have manually isolated data collected >>>> for IDNs, which means we still collect them, but we will not take >>>> them into account in the final database results. As I have said, >>>> this is a tiny subset of domains: 760 entries out of a total of 1 >>>> Million domains. >>>> >>>> I am *still* drafting a very long article for RIPE labs. In fact, >>>> we might publish this in two parts. In the meantime, the results >>>> appear to be somehow consistent with results of other tracking >>>> projects, some of which use other methods to track IPv6 adoption: >>>> >>>> - http://6lab.cisco.com/stats/ >>>> - https://www.vyncke.org/ipv6status/ >>>> - http://www.mrp.net/ipv6_survey/ >>>> >>>> We now have 306 Gb of comma separated value text data in store, >>>> tracing back the spread of the IPv6 Internet since July 2010. >>>> (294Gb in November 2015) >>>> >>>> I look forward to your kind feedback. >>>> >>>> Warmest regards, >>>> Olivier >>>> >>>> >>>> On 26/11/2015 19:32, Olivier MJ Crepin-Leblond wrote: >>>>> Hello all, >>>>> >>>>> Two worthy pieces of news regarding the IPv6 Matrix Project ( >>>>> http://www.ipv6matrix.org ): >>>>> >>>>> 1. I have updated the Web site with the latest results ending in >>>>> late October - hence noting a Crawl display date of November 2015. >>>>> We now have 294 Gb of comma separated value text data in store, >>>>> tracing back the spread of the IPv6 Internet since July 2010. >>>>> Altogether, we ran the text approximately 36 times on all 1 >>>>> million Alexa busiest Domain names. This represented testing of >>>>> about 6.5 million hosts, carefully collecting traceroute >>>>> information for each and every of them. We now have a very unique >>>>> database that is showing the spread of the IPv6 Internet >>>>> information sources worldwide. >>>>> >>>>> 2. Today I took out my very dusty Linux & Python gloves and >>>>> performed a much needed update to the IPv6 Matrix Crawler input >>>>> database, including the Alexa 1 million list as well as GeoIP >>>>> Databases. >>>>> >>>>> Indeed, the Alexa database of the world's 1 million busiest Web >>>>> sites dated from the Crawler's first inception in the first half >>>>> of 2010. >>>>> We're more than 5 years later! >>>>> >>>>> In a way, keeping the same input database has kept the base of >>>>> crawls the steady thus the ability to compare results was >>>>> possible. However, the flip-side of the coin is that we are ending >>>>> up with more and more domain names marked as being dysfunctional. >>>>> Nearly 5% of the domain names in the database were unreachable. >>>>> The updated input database should resolve this, but we might also >>>>> see a jump in some results. It will be interesting to see what the >>>>> next run yields. >>>>> Why do we not update the input database more often? Because buried >>>>> in that database are the domain names of the people who wanted to >>>>> opt out over the years. Having never thought about this, I spent >>>>> several hours tracing back 5 years of emails of people complaining >>>>> about the crawl triggering their firewalls. I put together a >>>>> blacklist of domain names I have manually deleted from the crawl >>>>> input files. >>>>> The blacklist, as it stands now: >>>>> >>>>> Deleted: >>>>> >>>>> it-mate.co.uk >>>>> indianic.com >>>>> your-server.de >>>>> catacombscds.com >>>>> dewlance.com >>>>> tcs.com >>>>> printweb.de >>>>> nocser.net >>>>> shoppingnsales.com >>>>> bsaadmail.com >>>>> epayservice.ru >>>>> 4footyfans.com >>>>> guitarspeed99.com >>>>> saga.co.uk >>>>> >>>>> Already gone from the current Alexa list: >>>>> >>>>> infinityautosurf.com >>>>> canada-traffic.com >>>>> usahitz.com >>>>> jawatankosong.com.my >>>>> 4d.com.my >>>>> fitnessuncovered.co.uk >>>>> kualalumpurbookfair.com >>>>> xgen-it.com >>>>> bpanet.de >>>>> edns.de >>>>> back2web.de >>>>> waaaouh.com >>>>> every-web.com >>>>> w3sexe.com >>>>> gratuits-web.com >>>>> france-mateur.com >>>>> pliagedepapier.com >>>>> immobilieretparticuliers.com >>>>> chronobio.com >>>>> stickers-origines.com >>>>> tailor-made.co.uk >>>>> >>>>> With these out of the input files, we are able to start the next >>>>> crawl. * >>>>> I hope I have not missed any complaints, but if I have, this is >>>>> advance notice that we might receive a few emails in the >>>>> forthcoming weeks. We might also receive a few emails from sites >>>>> that have appeared on the Alexa 1 million list since 2010.* >>>>> >>>>> Back to this list, the excellent filtering program which was used >>>>> to process the original list and clean it up was used again for >>>>> the modern list. The Alexa list had a number of domain names which >>>>> were actually sub-directories in the past, as well as some invalid >>>>> domains. Alexa has since tightened its act. The latest Alexa list >>>>> is much cleaner. It holds 999998 valid domains vs. 984587 domains >>>>> for the original 2010 list. >>>>> >>>>> Finally, new gTLDs have now appeared in the Alexa list, including >>>>> some Internationalised Domain Names (IDNs). The world is indeed a >>>>> very different place! >>>>> It will be interesting to see how the Crawler as well as all other >>>>> scripts to process the information into displayable data on the >>>>> Web server, will cope with these: >>>>> >>>>> academy.csv consulting.csv guide.csv >>>>> one.csv supply.csv >>>>> accountant.csv contractors.csv guru.csv >>>>> onl.csv support.csv >>>>> actor.csv cool.csv hamburg.csv >>>>> online.csv surf.csv >>>>> ads.csv country.csv haus.csv >>>>> ooo.csv swiss.csv >>>>> adult.csv creditcard.csv healthcare.csv >>>>> orange.csv sydney.csv >>>>> agency.csv cricket.csv help.csv >>>>> ovh.csv systems.csv >>>>> alsace.csv cymru.csv hiphop.csv >>>>> paris.csv taipei.csv >>>>> amsterdam.csv dance.csv holiday.csv >>>>> partners.csv tattoo.csv >>>>> app.csv date.csv horse.csv >>>>> parts.csv team.csv >>>>> archi.csv dating.csv host.csv >>>>> party.csv tech.csv >>>>> associates.csv deals.csv hosting.csv >>>>> photo.csv technology.csv >>>>> attorney.csv delivery.csv house.csv >>>>> photography.csv theater.csv >>>>> auction.csv desi.csv how.csv >>>>> photos.csv tienda.csv >>>>> audio.csv design.csv immobilien.csv >>>>> pics.csv tips.csv >>>>> axa.csv dev.csv immo.csv >>>>> pictures.csv tirol.csv >>>>> barclaycard.csv diet.csv ink.csv >>>>> pink.csv today.csv >>>>> barclays.csv digital.csv international.csv >>>>> pizza.csv tokyo.csv >>>>> bar.csv direct.csv investments.csv >>>>> place.csv tools.csv >>>>> bargains.csv directory.csv irish.csv >>>>> plus.csv top.csv >>>>> bayern.csv discount.csv jetzt.csv >>>>> poker.csv town.csv >>>>> beer.csv dog.csv joburg.csv >>>>> porn.csv toys.csv >>>>> berlin.csv domains.csv juegos.csv >>>>> post.csv trade.csv >>>>> best.csv earth.csv kim.csv >>>>> press.csv training.csv >>>>> bid.csv education.csv kitchen.csv >>>>> prod.csv trust.csv >>>>> bike.csv email.csv kiwi.csv >>>>> productions.csv university.csv >>>>> bio.csv emerck.csv koeln.csv >>>>> properties.csv uno.csv >>>>> black.csv energy.csv krd.csv >>>>> property.csv uol.csv >>>>> blackfriday.csv equipment.csv kred.csv >>>>> pub.csv vacations.csv >>>>> blue.csv estate.csv land.csv >>>>> quebec.csv vegas.csv >>>>> bnpparibas.csv eus.csv law.csv >>>>> realtor.csv ventures.csv >>>>> boo.csv events.csv legal.csv >>>>> recipes.csv video.csv >>>>> boutique.csv exchange.csv life.csv >>>>> red.csv vision.csv >>>>> brussels.csv expert.csv limited.csv >>>>> rehab.csv voyage.csv >>>>> build.csv exposed.csv link.csv >>>>> reise.csv wales.csv >>>>> builders.csv express.csv live.csv >>>>> reisen.csv wang.csv >>>>> business.csv fail.csv lol.csv >>>>> ren.csv watch.csv >>>>> buzz.csv faith.csv london.csv >>>>> rentals.csv webcam.csv >>>>> bzh.csv farm.csv love.csv >>>>> repair.csv website.csv >>>>> cab.csv finance.csv luxury.csv >>>>> report.csv wien.csv >>>>> camera.csv fish.csv management.csv >>>>> rest.csv wiki.csv >>>>> camp.csv fishing.csv mango.csv >>>>> review.csv win.csv >>>>> capital.csv fit.csv market.csv >>>>> reviews.csv windows.csv >>>>> cards.csv fitness.csv marketing.csv >>>>> rio.csv work.csv >>>>> care.csv flights.csv markets.csv >>>>> rip.csv works.csv >>>>> career.csv foo.csv media.csv >>>>> rocks.csv world.csv >>>>> careers.csv football.csv melbourne.csv >>>>> ruhr.csv wtf.csv >>>>> casa.csv forsale.csv menu.csv >>>>> ryukyu.csv xn--3e0b707e.csv >>>>> cash.csv foundation.csv microsoft.csv >>>>> sale.csv xn--4gbrim.csv >>>>> casino.csv frl.csv moda.csv >>>>> scb.csv xn--80adxhks.csv >>>>> center.csv fund.csv moe.csv >>>>> school.csv xn--80asehdb.csv >>>>> ceo.csv futbol.csv monash.csv >>>>> science.csv xn--90ais.csv >>>>> chat.csv gal.csv money.csv >>>>> scot.csv xn--d1acj3b.csv >>>>> church.csv gallery.csv moscow.csv >>>>> services.csv xn--j1amh.csv >>>>> city.csv garden.csv movie.csv >>>>> sexy.csv xn--p1ai.csv >>>>> claims.csv gent.csv nagoya.csv >>>>> shiksha.csv xn--pgbs0dh.csv >>>>> click.csv gift.csv network.csv >>>>> shoes.csv xn--q9jyb4c.csv >>>>> clinic.csv gifts.csv new.csv >>>>> singles.csv xn--wgbl6a.csv >>>>> clothing.csv glass.csv news.csv >>>>> site.csv xxx.csv >>>>> club.csv global.csv nexus.csv >>>>> social.csv xyz.csv >>>>> coach.csv globo.csv ngo.csv >>>>> software.csv yandex.csv >>>>> codes.csv gmail.csv ninja.csv >>>>> solar.csv yoga.csv >>>>> coffee.csv goo.csv nrw.csv >>>>> solutions.csv yokohama.csv >>>>> college.csv goog.csv ntt.csv >>>>> soy.csv youtube.csv >>>>> community.csv google.csv nyc.csv >>>>> space.csv zone.csv >>>>> company.csv graphics.csv office.csv style.csv >>>>> computer.csv gratis.csv okinawa.csv sucks.csv >>>>> >>>>> In the meantime I'd like to cite again the Nile University Crew >>>>> expertly led by Sameh El Ansary for designing and coding a >>>>> Crawler's that been able to cope with shifting through 5 years of >>>>> DNS junk with minimal maintenance, save the love and attention I >>>>> give the servers by keeping them up to date with patches so they >>>>> don't end up toppling over. They haven't been rebooted in 464 days >>>>> and I am crossing fingers for their well-being. >>>>> And of course, thanks to the University of Southampton Crew who >>>>> built the excellent 2nd version of the Web Site under Tim Chown's >>>>> supervision. >>>>> >>>>> I am still writing an article for RIPE Labs - just struggling to >>>>> find the time to finish it, but getting there. >>>>> >>>>> Warmest regards, >>>>> >>>>> Olivier >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> IPv6crawler-wg mailing list >>>>> IPv6crawler-wg at gih.co.uk >>>>> http://gypsy.gih.co.uk/mailman/listinfo/ipv6crawler-wg >>>> >>>> -- >>>> Olivier MJ Cr?pin-Leblond, PhD >>>> http://www.gih.com/ocl.html >>> >>> -- >>> Christian de Larrinaga FBCS, CITP, >>> ------------------------- >>> @ FirstHand >>> ------------------------- >>> +44 7989 386778 >>> cdel at firsthand.net >>> ------------------------- >>> >> >> -- >> Olivier MJ Cr?pin-Leblond, PhD >> http://www.gih.com/ocl.html > -- Christian de Larrinaga FBCS, CITP, ------------------------- @ FirstHand ------------------------- +44 7989 386778 cdel at firsthand.net ------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: