<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" version="XHTML+RDFa 1.0" dir="ltr" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/terms/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:og="http://ogp.me/ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:sioc="http://rdfs.org/sioc/ns#" xmlns:sioct="http://rdfs.org/sioc/types#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <head profile="http://www.w3.org/1999/xhtml/vocab"> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <link rel="shortlink" href="/node/1876" /> <meta name="Generator" content="Drupal 7 (http://drupal.org)" /> <link rel="canonical" href="/show/23" /> <title>Increase the number of nodes in the 64 node cluster. | DICE development projects</title> <style type="text/css" media="all"> @import url("/modules/system/system.base.css?nwnzct"); @import url("/modules/system/system.menus.css?nwnzct"); @import url("/modules/system/system.messages.css?nwnzct"); @import url("/modules/system/system.theme.css?nwnzct"); </style> <style type="text/css" media="all"> @import url("/modules/comment/comment.css?nwnzct"); @import url("/modules/field/theme/field.css?nwnzct"); @import url("/modules/node/node.css?nwnzct"); @import url("/modules/search/search.css?nwnzct"); @import url("/modules/user/user.css?nwnzct"); @import url("/sites/all/modules/views/css/views.css?nwnzct"); </style> <style type="text/css" media="all"> @import url("/sites/all/modules/ctools/css/ctools.css?nwnzct"); </style> <style type="text/css" media="all"> @import url("/sites/default/files/color/bluemarine-8db4c44d/style.css?nwnzct"); </style> <script type="text/javascript" src="http://devproj.inf.ed.ac.uk/misc/jquery.js?v=1.4.4"></script> <script type="text/javascript" src="http://devproj.inf.ed.ac.uk/misc/jquery.once.js?v=1.2"></script> <script type="text/javascript" src="http://devproj.inf.ed.ac.uk/misc/drupal.js?nwnzct"></script> <script type="text/javascript"> <!--//--><![CDATA[//><!-- jQuery.extend(Drupal.settings, {"basePath":"\/","pathPrefix":"","ajaxPageState":{"theme":"bluemarine","theme_token":"MpxxCtYbVAiDofIsY-RwAh2_8-pUJkOgl3n7UOKqt04","js":{"misc\/jquery.js":1,"misc\/jquery.once.js":1,"misc\/drupal.js":1},"css":{"modules\/system\/system.base.css":1,"modules\/system\/system.menus.css":1,"modules\/system\/system.messages.css":1,"modules\/system\/system.theme.css":1,"modules\/comment\/comment.css":1,"modules\/field\/theme\/field.css":1,"modules\/node\/node.css":1,"modules\/search\/search.css":1,"modules\/user\/user.css":1,"sites\/all\/modules\/views\/css\/views.css":1,"sites\/all\/modules\/ctools\/css\/ctools.css":1,"sites\/all\/themes\/bluemarine\/style.css":1}},"urlIsAjaxTrusted":{"\/node\/1876":true}}); //--><!]]> </script> </head> <body class="html not-front not-logged-in no-sidebars page-node page-node- page-node-1876 node-type-project" > <div id="skip-link"> <a href="#main-content" class="element-invisible element-focusable">Skip to main content</a> </div> <div id="header" class="clearfix"> <div class="region region-search"> <div id="block-search-form" class="block block-search"> <div class="content"> <form action="/node/1876" method="post" id="search-block-form" accept-charset="UTF-8"><div><div class="container-inline"> <h2 class="element-invisible">Search form</h2> <div class="form-item form-type-textfield form-item-search-block-form"> <label class="element-invisible" for="edit-search-block-form--2">Search </label> <input title="Enter the terms you wish to search for." type="text" id="edit-search-block-form--2" name="search_block_form" value="" size="15" maxlength="128" class="form-text" /> </div> <div class="form-actions form-wrapper" id="edit-actions"><input type="submit" id="edit-submit" name="op" value="Search" class="form-submit" /></div><input type="hidden" name="form_build_id" value="form-5t7sD71r70Awd1K68hEhh0q7pXh2PeBvBd9yNGkF4_0" /> <input type="hidden" name="form_id" value="search_block_form" /> </div> </div></form> </div> </div> </div> <a href="/" title="Home" rel="home" id="logo"> <img src="http://devproj.inf.ed.ac.uk/sites/default/files/EdUniInfLogo.png" alt="Home" /> </a> <div id="menu"> </div> </div> <div class="layout-columns clearfix"> <div id="main" class="column"> <div class="inner"> <h2 class="element-invisible">You are here</h2><div class="breadcrumb"><a href="/">Home</a></div> <h1 class="title" id="page-title">Increase the number of nodes in the 64 node cluster.</h1> <div class="tabs"></div> <div class="region region-content"> <div id="block-system-main" class="block block-system"> <div class="content"> <div id="node-1876" class="node node-project clearfix" about="/show/23" typeof="sioc:Item foaf:Document"> <span property="dc:title" content="Increase the number of nodes in the 64 node cluster." class="rdf-meta element-hidden"></span><span property="sioc:num_replies" content="0" datatype="xsd:integer" class="rdf-meta element-hidden"></span> <span class="submitted"><span property="dc:date dc:created" content="2013-01-25T15:45:34+00:00" datatype="xsd:dateTime">Fri, 01/25/2013 - 15:45</span> - <span rel="sioc:has_creator"><span class="username" xml:lang="" about="/users/boss" typeof="sioc:UserAccount" property="foaf:name" datatype="">boss</span></span></span> <div class="content"> <div class="field field-name-field-projectid field-type-serial field-label-inline clearfix"><div class="field-label">Project ID: </div><div class="field-items"><div class="field-item even">23</div></div></div><div class="field field-name-field-current-stage field-type-taxonomy-term-reference field-label-inline clearfix"><div class="field-label">Current stage: </div><div class="field-items"><div class="field-item even"><a href="/project-stages/5completed" typeof="skos:Concept" property="rdfs:label skos:prefLabel" datatype="">5_Completed</a></div></div></div><div class="field field-name-field-manager field-type-taxonomy-term-reference field-label-inline clearfix"><div class="field-label">Manager: </div><div class="field-items"><div class="field-item even"><a href="/project-managers/iainr" typeof="skos:Concept" property="rdfs:label skos:prefLabel" datatype="">iainr</a></div></div></div><div class="field field-name-field-unit field-type-taxonomy-term-reference field-label-inline clearfix"><div class="field-label">Unit: </div><div class="field-items"><div class="field-item even"><a href="/unit/rat-unit" typeof="skos:Concept" property="rdfs:label skos:prefLabel" datatype="">rat-unit</a></div></div></div><div class="field field-name-field-what field-type-text-long field-label-above"><div class="field-label">What: </div><div class="field-items"><div class="field-item even"><p><b>Description: </b> There is currently space in the beowulf racking for additional nodes for this cluster. With the addition of a shelf in rack 11 and utilising existing spare network kit it should be possible to add up to 16 destop machines to lion. These nodes would act as a bank of hot spares and at the same time could be used to run short jobs on individual nodes.</p> <p>This would make use of equipment currently sitting in storage, improve the usability of the cluster and provide some more capacity for some cluster users.</p> <p><b>Deliverables: </b> An additional 10-16 nodes for the lion cluster.</p> </div></div></div><div class="field field-name-field-why field-type-text-long field-label-above"><div class="field-label">Why: </div><div class="field-items"><div class="field-item even"><p><b>Customer: </b> The gridengine userbase:- research staff and MSc/Phd students.</p> <p><b>Case statement: </b> Lion's nodes are now 4+ years old and we are seeing more unexplained glitches and the first outright hardware failures. Until now the nodes have been remarkably reliable and it has been possible to maintain the cluster using two "cold spares". Recently the frequency of problems had got to the stage where on average one or two nodes may be unavailable during the week which causes problems for users requiring to run jobs on a large number (or all) of the nodes.</p> <p>Adding nodes would both reduce the pressure on what is still a heavily used cluster and reduce the pressure on the support staff to maintain 100% uptime on all the nodes. This could also act as a trial for replacing a percentage of the cluster nodes with old desktop machines on a rolling basis.</p> </div></div></div><div class="field field-name-field-when field-type-text-long field-label-above"><div class="field-label">When: </div><div class="field-items"><div class="field-item even"><p><b>Status: </b> pending approval</p> <p><b>Timescales: </b> This should be done as the clusters are upgraded to FC5, before January 2007</p> <p><b>Priority: </b> This would be a fairly cheap way to expand the cluster and reduce support overheads a bit using what is essentially surplus kit.</p> <p>If we want to go ahead with this we should do this now, or defer it until we make whatever changes to the clusters when we move to the forum. </p> <p><b>Time: </b> Some support/technician time to identify likely spare PC's and transport them to KB.</p> <p>Some technician time to find and install a shelf (this should just be grabbing a BSI standard shelf from 3312 or 2905) and to install some truning for routing power cables.</p> <p>4-5 days to integrate switch and new nodes into the cluster.</p> </div></div></div><div class="field field-name-field-how field-type-text-long field-label-above"><div class="field-label">How: </div><div class="field-items"><div class="field-item even"><p><b>Proposal: </b> Redeploy the minibw procurve and up to 16 desktops from storage to create a hot spare pool for the 64 node cluster</p> <p><b>Resources: </b> The nodes, a BSS shelf, cables and if, we have any, 1 or 2 spare 1Gb cards for a Procurve 4000.</p> <p><b>Plan: </b> </p><p> 1. Fit new shelf and move minibw procurve.</p> <p> 2. Fit bridge trunking between beowulf rack and rack </p> <p> 3. Reorganise cluster power cabling 2 FTE dayDuring cluster downtime for upgrade</p> <p> 4. Install nodes 2 FTE dayIdeally integrated with the cluster upgrade.</p> <p> 5. Add nodes into gridengine configuration.1 FTE day</p> </div></div></div><div class="field field-name-field-other field-type-text-long field-label-above"><div class="field-label">Other: </div><div class="field-items"><div class="field-item even"><p><b>Dependencies: </b> The project doesn't have any dependancies but ideally should be coordinated with the cluster upgrades to FC5.</p> <p><b>Risks: </b> None foreseen.</p> <p><b>URL:</b> <a href="http://www.dice.inf.ed.ac.uk/units/research_and_teaching/projects/lion_expansion.html">http://www.dice.inf.ed.ac.uk/units/research_and_teaching/projects/lion_e...</a> </p> <p><b>Milestones</b></p> <table><th>Proposed date</th> <th>Achieved date</th> <th>Name</th> <th>Description</th> <tr><td>2007-03-03</td> <td>2007-03-04</td> <td>reconfigure rac</td> <td>install shelving and move minibw switch to rack 12</td> </tr><tr><td>2007-03-18</td> <td>2007-03-20</td> <td>Source hardware</td> <td>Source redundant desktops from support and install then in racking.</td> </tr><tr><td>2007-03-30</td> <td>2007-04-04</td> <td>Integrate nodes</td> <td>integrate the new nodes into the cluster </td> </tr><tr><td>2007-06-20</td> <td>2007-05-30</td> <td>report</td> <td>Write up a report on the project</td> </tr><tr><td></td> <td>2007-07-14</td> <td>signoff</td> <td>submit project to be signed off</td> </tr></table></div></div></div> </div> </div> </div> </div> </div> </div> </div> </div> <div id="footer"> <div class="region region-footer"> <div id="block-block-1" class="block block-block"> <div class="content"> <p>Unless explicitly stated otherwise, all material is copyright The University of Edinburgh.</p> </div> </div> </div> </div> </body> </html>