zundel

Monday 297

Taco Bell Programming

Filed under: Computers — Tags: — zundel @ am

Taco Bell Programming

Pfft, fuck that noise:
[…]
32 concurrent parallel parsing processes and zero bullshit to manage.

Moving on, once you have these millions of pages (or even tens of millions), how do you process them? Surely, Hadoop MapReduce is necessary, after all, that’s what Google uses to parse the web, right?

Pfft, fuck that noise:

find crawl_dir/ -type f -print0 | xargs -n1 -0 -P32 ./process

32 concurrent parallel parsing processes and zero bullshit to manage. Requirement satisfied.

It may not get you invited to speak at conferences, but it will get the job done, and help keep your pager from going off at night.

Blog at WordPress.com.

%d bloggers like this: