<<

Group Compung Introducing DICE

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 1 Outline • Available compung resources. • How to access it • Future plans • Summary

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 2 DICE

• DICE: Data Intensive Compung Environment – An incarnaon of DIC: Data Intensive Cluster • In numbers: – 692 cores of raw compung power – 560 TB of HDFS* disk space – Special: on of the above has 128 GB of RAM – Special2: 36 cores + 27 TB for BigCouch* • hps://wikis.bris.ac.uk/display/dic/Home

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 3 DICE offers

(flavoured by Cloudera): – hp://hadoop.apache.org/ – TL;DR: A batch system that is data-aware – Designed for specific work-flow (MapReduce) but not limited to it – Comes with Hadoop Distributed File System (HDFS) [next slide]

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 4 DICE offers (2)

• HDFS: – Redundant file system (spreads over machines & hard-drives), DICE default: 2 copies per file – Available on all workers + submission nodes* under /hdfs – Designed for READ not WRITE (writes are slower) • /hdfs mount is mostly POSIX compliant: – ‘cp’ and read will work – No streaming of data (ROOT file creaon on HDFS) – mv/rename only work with the hadoop commands: hp://hadoop.apache.org/docs/stable/hadoop-project-dist/ hadoop-common/FileSystemShell.

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 5 DICE offers (3)

• Apache CouchDB (BigCouch, Cloudant flavour): – hp://couchdb.apache.org/ – NoSQL (not only sql) data base – Uses JSON for documents – Uses MapReduce for queries – Recalculates preset views on-the-fly – API exposed via HTTP

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 6 How to access DICE

vm-astro.dice.priv soolin.phy.bris.ac.uk

The LHC grid vm-hep.dice.priv

DICE compung (via HTCondor), storage (via NFS/HDFS)

Documentaon: hps://htcondor-wiki.cs.wisc.edu/index.cgi/index hps://wikis.bris.ac.uk/display/dic/HTCondor

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 7 More informaon

• General informaon: – hps://wikis.bris.ac.uk/display/dic/Home

• How to get an account on DICE (& other things): – hps://wikis.bris.ac.uk/display/dic/FAQ • Current campaign: – hps://wikis.bris.ac.uk/display/dic/%28HEP %29+PHYS14+campaign

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 8 Summary • We have a ‘big’ cluster with 692 cores and 560 TB of disk space • It is currently underused

Help needed to fill this space! Do you need compung?

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 9 Thanks for listening. QUESTIONS?

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 10 Backup slides

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 11 What is DICE?

• A small experimental cluster (20 machines) • Playground for large data sets (Big Data*): – stored on HDFS*, Hbase, (Big)CouchDB – Analysed with MapReduce – Made easier with Oozie/Hive/Pig • It is not: – A pure CPU/GPU farm (that’s done by BlueCrystal and LHC grid)

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 12 DICE for physics

• In general DICE provides: – Reliable (HDFS) and fast storage – Built to deal with large data sets (> 10 TB) – Easy to extend – Fast (current: 10Gbit, max 30Gbit) connecon to outside • The current plans involve: – Mounng CMS/LHCb via CVMFS (server in place!) – Using HDFS as grid storage (gridFTP and SRM server in preparaon) – Using xrootd for remote data access (data on demand!)

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 13 Hadoop = HDFS + MapReduce

• HDFS = Hadoop distributed file system – Resilient: stores data across machines/racks, automa replicaon on failure – Fast: data available on mulple machines = faster access – Easy: adding/removing nodes is built-in (no downme)

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 14 Hadoop = HDFS + MapReduce

• MapReduce: – An analysis framework – Each job consists of two phases: map & reduce – Map = parallel reading +analysing – Reduce = combinaon of map outputs • (Aempt of a) translaon for physicists : – Map = what data am I interested in (muon pt, etc) – Reduce = selecon + histograms

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 15 Hbase/Oozie/Hive/Pig

• Hbase: BigTable (Google) like database on top of HDFS (hp://hbase.apache.org/) • Oozie: Workflow scheduler for Hadoop (hp://oozie.apache.org/) • Pig: plaorm for analysing data with MapReduce (hp://pig.apache.org/) • Hive: MapReduce with SQL syntax (Facebook, hp://hive.apache.org/)

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 16 CouchDB • Interesng possibilies: – Once a query (here view) is defined you can see changes as the data comes in – Imagine views = histograms – Imagine slider for cuts with life changes to your histograms

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 17 DICE Users

• Centre for sustainable energy (Hadoop)

• Landslides (CouchDB)

• Children of the 90’s (CouchDB)

• Physics soon (Hadoop + CouchDB?)

– Your group could be on this list!

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 18 More accurate CPU

Free CPU

Wednesday, 17 December 14 Lukasz Kreczko - PP group meeng 19