Job Description
DescriptionWest Virginia University's Research Corporation is seeking applications for a Senior HPC Systems Administrator.
About the Opportunity
The High-Performance Computing (HPC) systems administrator is responsible both for the day-to-day maintenance and upkeep of three WVU research computing systems (A 178-node CPU-based HPC cluster co-located at the Pittsburgh Supercomputing Center, a 37-node GPU-based HPC cluster in the WVU Chemistry Research Lab and a 9-node HIPAA-compliant cluster for the West Virginia Clinical and Translational Science Institute WVCTSI).
We expect that the junior HPC system administrator will work with senior HPC System administrators to maintain high standards of computing performance, creation and administration of user accounts, and ensure data integrity and security compliance (particularly acute with the HIPAA-compliant HPC cluster). The junior HPC systems administrator is also responsible for forecasting needs for new data storage, connectivity, and both software and hardware upgrades.
At WVU Research Corporation, we strongly believe in work-life balance and keeping time for things we love outside our work. WVU Research Corporation offers a comprehensive benefits package with a variety of options to suit your needs:
13 paid holidays (staff holiday calendar)
Paid Time off (PTO)
403(b) retirement savings with a fully vested 3% employee contribution match, (Employees have the option of contributing an additional 1-3% of their earnings to the plan, which is also matched by the WVURC)
A range of health insurance and other benefits
Dependent Education Scholarship
WVU Perks
What you'll do:
Creation and maintenance of user’s accounts
Plan for and develop processes for future HPC services.
Manage new and existing research data storage systems.
Provide routine and ongoing systems maintenance and upgrades to current HPC clusters, data storage devices, networks, and other related systems.
Understand HIPAA privacy regulations and how they apply to current and future HPC systems at the university.
Monitor and evaluate performance and operational integrity of HPC clusters and other computers used for scientific computing, perform adjustments and configuration changes, modify systems, and optimize their performance.
Interface with faculty users for customer service.
Securing HPC and related systems in conjunction with University Policies.
Assist in managing and supporting server infrastructure. Most servers run RedHat Enterprise Linux or binary-compatible Linux distributions.
Perform system upgrades, patch management, and backups.
Monitor system performance and assist in ensuring high availability.
Assist in implementing and maintaining security measures and compliance standards.
Troubleshoot hardware and software issues with guidance from senior staff.
Participate in projects to improve system efficiency and reliability.
Collaborate with other departments to meet business requirements.
Qualifications
Bachelor's degree in Computer Science, Computer Engineering, Electrical & Electronics Engineering, or a specific related field.
A minimum of two (2) years of experience in system administration or related role.
Any equivalent combination of education and/or experience will be considered.
All qualifications must be met by the time of employment.
Knowledge, Skills, Abilities
Basic understanding of virtualization technologies (Azure Virtual Desktop, VMware, Hyper-V).
Basic knowledge of networking protocols and services (DNS, DHCP, TCP/IP).
Familiarity with cloud platforms (AWS, Azure, Google Cloud) is a plus.
Familiarity with enterprise-level Unix/Linux systems like Red Hat Enterprise Linux (RHEL) or similar distributions (e.g, SuSE Linux Enterprise, Rocky Linux, AlmaLinux)
Workable knowledge of shell scripting (e.g., bash, csh, ksh).
Knowledge of HPC Cluster Platforms (Warewolf, Bright Cluster Manager) and open-source provisioning software (Cobbler, Spacewalk, Kickstart))
Familiarity with IT Automation Engines (e.g., Ansible)
Administrator-level familiarity with HPC resource managers (e.g., Slurm)
Knowledge of Network and Distributed Filesystems (e.g., GPFS, Gluster, BeeGFS, Lustre) and Network Filesystem protocols (e.g., NFS, SMB)
Familiarity with Linux-Windows interoperability and authentication (SSSD, winbind, samba, Microsoft Active Directory, LDAP, NIS)
Good problem-solving and analytical skills.
Strong communication and interpersonal skills.
Ability to work independently and as part of a team.
Detail-oriented with a focus on accuracy and quality.
Ability to manage multiple tasks and prioritize effectively.