zundel

Monday 297

Taco Bell Programming

Filed under: Computers — Tags: — zundel @ am

Taco Bell Programming

Pfft, fuck that noise:
[…]
32 concurrent parallel parsing processes and zero bullshit to manage.

Moving on, once you have these millions of pages (or even tens of millions), how do you process them? Surely, Hadoop MapReduce is necessary, after all, that’s what Google uses to parse the web, right?

Pfft, fuck that noise:

find crawl_dir/ -type f -print0 | xargs -n1 -0 -P32 ./process

32 concurrent parallel parsing processes and zero bullshit to manage. Requirement satisfied.

It may not get you invited to speak at conferences, but it will get the job done, and help keep your pager from going off at night.

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: