Monitoring and Alerting Engineer Job at BNSF Railway, Fort Worth, TX

ZlRIeVBhSkdFMmYwOVdJTHFwMER4eDVKcFE9PQ==
  • BNSF Railway
  • Fort Worth, TX

Job Description

Monitoring and Alerting Engineer

Pay Rate: $75.00/hr. and $82.00/hr. with Full Benefits

Location: Fort Worth, TX

Schedule: 4 days per week Onsite Required

Some weekend availability may be required from time to time.

7 days /24 hour Operation Environment - After hours support flexibility is REQUIRED.

Job Description

A Monitoring and Alerting Engineer is a specialized IT professional responsible for the design, implementation, and management of monitoring and alerting systems for an organization's IT infrastructure. Their primary goal is to ensure the continuous availability, reliability, and performance of critical systems and applications. By leveraging various monitoring tools and technologies, they proactively identify and address potential issues before they impact business operations.

Key Responsibilities

  • System Monitoring: Implement and maintain monitoring solutions to track the performance, health, and availability of IT systems, applications, and networks.
  • Alert Management: Configure and manage alerting mechanisms to ensure timely notifications of any anomalies, failures, or performance degradations.
  • Incident Response: Collaborate with support and operations teams to analyze, resolve, and lead event resolution processes during incidents and outages.
  • Root Cause Analysis: Conduct thorough investigations to determine the root cause of incidents and implement corrective actions to prevent recurrence.
  • Optimization: Identify opportunities for system optimization and performance improvements through data analysis and trend identification.
  • Tool Evaluation and Integration: Evaluate, recommend, and integrate new monitoring and alerting tools and technologies to enhance the organization's monitoring capabilities.
  • Documentation and Reporting: Develop and maintain comprehensive documentation, including monitoring configurations, incident reports, and performance metrics.
  • Collaboration and Communication: Work closely with various IT teams, including application, infrastructure, and DevOps teams, to ensure seamless operations and effective communication during incidents.

Performance of Duties

  • Operate in a 7-day/24-hour environment with after-hours support flexibility.
  • Collaborate with internal teams and suppliers to resolve and lead event resolution across all mission-critical IT and Telecom service levels.
  • Protect business system availability through integrated incident, problem, and change management.
  • Monitor systems for faults and optimization opportunities.
  • Assist the major incident response team and escalate critical events.
  • Evaluate and improve monitoring/alerting tools and processes.
  • Conduct technical root cause analysis and engage with management teams for internal issues.
  • Identify potential business-impacting events and manage incident processes.
  • Provide expert guidance during reviews and debriefs.
  • Analyze problem trends and monitor tools to identify chronic activity.
  • Communicate effectively with senior management.

Required Qualifications

  • Experience with Dynatrace, AppMon, Zabbix, SCOM, Datadog, CloudWatch, X-Ray, and Splunk.
  • Strong understanding of IT infrastructure, including servers, networks, databases, and cloud environments.
  • Some Experience with incident, problem, and change management processes a plus
  • Ability to analyze complex systems and identify performance bottlenecks.
  • Excellent troubleshooting and problem-solving skills.
  • Effective communication and collaboration skills.
  • Familiarity with ITIL best practices and service management frameworks.
  • Self-motivated and able to work in a 7x24 environment.
  • Experience managing critical system outages and interacting at all organizational levels.
  • On-call support availability.

Preferred Qualifications

  • B.S. degree in Computer Science, Information Systems, or Engineering.
  • Technical expertise in distributed systems/administration and general scripting/programming (Python, Node.js, Ruby, Perl, Bash/sh).
  • Excellent writing and communication skills.
  • ServiceNow experience.

Job Tags

Weekend work,

Similar Jobs

Calculated Hire

Paraprofessional Job at Calculated Hire

 ...Paraprofessional - Elementary School Longmont, CO On-site Monday-Friday, 35 hours/week 1 year contract with likely extension Position Summary: The Elementary School Paraprofessional works closely with teachers to support the academic and social development... 

PANCHITOS MEXICAN RESTAURANT

Server - Panchito's Mexican Restaurant Job at PANCHITOS MEXICAN RESTAURANT

Panchitos Servers Is this Role for you? Are you passionate about creating a welcoming and joyful atmosphere? Do you thrive in a fast-paced, family-oriented environment? We are searching for talented individuals to join our team as a fun-loving Server Extraordinaire!...

Mid South Pressure Wash

Lead Exterior Cleaning Specialist Job at Mid South Pressure Wash

 ...Here we GROW again!!! Mid South Pressure Wash is looking to add another awesome crew member to our team of exterior cleaning & holiday lighting rockstars! &##127928; This position is for full time employment with the possibility of promotions and advancement within... 

BMS Performance

Graduate Commercial sales trainee Job at BMS Performance

~ Industry leading graduate sales opportunity with Edmundson Electrical ~27,500-29,500 base salary (dependent on location) + lucrative profit share scheme ~ No experience needed, fantastic opportunity to start your career ~12 months rotation through various departments... 

Get It - Finance

Life Insurance Agent - Remote Job at Get It - Finance

 ...you ready to take control of your career and work remotely with flexibility? We're looking for driven, motivated life insurance agents (or those willing to become licensed) to...  ...Environment: Work from the comfort of your home or anywhere you have a reliable phone and internet...