Senior Systems Administrator
Senior Linux System Administrator – L2
Experience: 4-7 Years
Endurance International Group, Inc., a leading provider of innovative Internet-based solutions to small and medium-sized businesses, is looking for a dynamic, energetic and bright individual to join our Technical Operations team in our Bangalore office. This is a tremendous and unique opportunity to join a nimble global company that has achieved significant scale over the past fifteen years, yet possesses enormous growth potential. You will play an instrumental role in achieving this growth.
As part of a team that’s leading the next wave of performance innovation at Endurance, you will play an integral part in advancing service assurance and building a culture of technical excellence across the enterprise. Our team is passionate about innovating solutions to help our customers achieve maximum operational uptime and service performance. This position manages a sizable team of monitoring engineers working across multiple time zones.
As part of the Technical Operations team you will work with a team of highly technical operations engineers, software engineers, and system architects to deliver five 9’s uptime, with a secure, reliable & performing system.
Work Experience: 4 to 7 Years
Roles and Responsibilities:
- Participate in 12×7 shifts working with a global team in follow the sun model. ( No Graveyard or Night Shift)
- Provide remote infrastructure support for our SaaS, PaaS, IaaS products across globally distributed data centres
- Automate OS and application deployments using tools such as WDS, Cobbler and Puppet
- Conduct regular patch management and system maintenance to ensure the health of platforms/servers
- Set up health checks for systems & applications in monitoring tools like Zabbix, Nagios, SolarWinds etc
- Troubleshoot and fix issues meeting SLA’s and operational standards
- Manage incidents and escalations as per policies/procedures to meet incident management and uptime SLAs
- Liaise with engineering teams for RCA’s, permanent resolutions on issues using tickets and chat conference rooms
- Identify repetitive tasks and automate using Bash / Ruby / Perl
- Contribute to operations handbook
- Investigate and assess alerts for hardware and schedule replacement or tests
- Initiate emergency maintenance as needed for failed/failing hardware (drives, RAID controllers, power supplies, etc)
- Provide guidance to L1 monitoring regarding hardware status/health, complicated server health analysis
- Ensure smooth hand-offs between shifts
You must have strong interpersonal communication skills and ability to work well in a diverse, team-focused environment
- You must be an expert in Linux/Unix Operating system fundamentals, file system troubleshooting and various tools/ services that are available by default with the Operating system
- You are an expert on all things related to keeping a Linux, Apache, MYSQL stack up running and secure
- You have advanced skills troubleshooting skills on Dell and Supermicro Hardware along with Raid controllers from Dell, LSI & Adaptec
- You have a passion for system monitoring tools and are a hands on expert at maintaining tools like Nagios, Zabbix, cacti, Ganglia, etc
- You always looking for scope to automate and build tool to expedite repetitive task
- Strong grasp on configuration management tools, such as Ansible, Puppet or Chef
- Expert level skills with Scripting/Programming in Bash/PERL/Ruby/Python/Go (any) and a good understanding of regular expressions
- At least 4 years of hosting industry experience working on large cluster of servers running common web hosting panels like cPanel, Plesk, website Panel, Parallels virtuozzo etc
- In depth knowledge of how the Internet works (HTTP, DNS, Streaming, Mails etc.) and experience in configuring and optimizing services (DNS,Webserver,Mail) and the infrastructure necessary to support a dynamic website (load balancers, connecting to databases, etc.)
- Expertise with configuring IPsec, VPN, Load Balancing, Iperf, MTR, Routing Protocols, SSH, Network Monitoring / Troubleshooting tool.
- Good understanding of various other technology stack like Networking, Central storage & Data Center operations.
- You have networking experience including packet decoding, layer 2 switching basics and a good understanding of the OSI model
- Experience in file systems like ext3/4, XFS, ZFS, btrfs, NAS, SAN, SAMBA etc
- Expert level knowledge in server hardening by managing IPTables, CSF, Firewalld. Selinux
- Hands on experience with virtualisation technology Citrix XenServer/VMWare/KVM