Wednesday, February 22, 2023

Matillion : Automation - Restart matillion Services Weekly

 Matillion : Automate - Restart matillion Services Weekly 

Non-PRD  Matillion Servers are heavily used by developers daily and there will be many process / connections opened on server level, due to this it will not allow other users to login into system or server performance will be degraded. To resolve this issue admin team has developed an automated script which will recycle Matillion non-prod services without any manual intervention and sends email alerts on success / failure of service. On recycle of services all hung process or unused process will be killed. This script execution flushes out temporary Memory on Linux servers used.




  1. The script gets executed every time from a external scheduler or cron tab.
  2. The script initially checks the tomcat service run state.
  3. If the tomcat services not running by default, then restart activity is not performed and a failure mail is sent.
  4. If the tomcat service runs then the script executes stop command and stops the tomcat service.
  5. After a minute the script checks whether the tomcat service got stopped.
  6. If the tomcat services are stopped then the script executes start command. It starts the tomcat service and confirms the tomcat service start, then sends a success notification email to recipients added.
  7. If the tomcat services not stopped then the script executes stop command. It tries to stop once again and proceed on step 4.
  8. If the tomcat services are not stopping after 3 consecutive attempt then a failure notification mail is sent.
  9. All the activities are captured in Restart_log.txt file.

Benefits:

v  Kills unwanted process and refresh server CPU more  to Matillion.

v  Avoids Jobs does not get into hung state.

v  Keeps web services always alive

v  Kills the existing long running jobs

v  No manual intervention

v  Alerts the admin team by sending notifications on success/failures

 Process Highlight:

v  If the webserver is in stop state as per any adhoc request, then the script never executes stop command but sends an alert. A failure mail notification. 

v  The script tries three attempts to stop the webserver and sends 3 failure attempt mails.

v  The script writes its actions as a log for easy analysis.

Log Files: Every action executed in script is written in restart_log.txt log file. Timestamp is written along with log for tracing. The time            format used in log is UST. The file gets updated during every execution of the script.

Email Notifications: The script sends 3 types of mails as per the scenario

       Restart Success Mail

       Restart Failure Mail

       Already in stop state Mail

Script Details

·       Script name: restart_sh.sh

·       Script Path: /home/centos

·       Script Owner: Root

·       Script Permission: Read, Write, Execute

·       Supported files: The script generates below files.

o   Status.txt – This file is used to check the status of tomcat service within the script. This file gets newly created during every execution in the same path.

Restart_log.txt – It is a log file, It contains the actions performed by script as text with date and time. The time format used in log is UST. The file gets updated during every execution of the script

Script:

#!/bin/bash
recipients="ambarish@abcd.com"
cd /home/admin_usr1/
systemctl status tomcat8.service > /home/admin_usr1/status.txt
run_flag=`grep 'active (running)' /home/admin_usr1/status.txt|wc -l`

if [ $run_flag -eq 1 ]
then
echo "`date "+%D %T"` :webserver is active">> restart_log.txt
else
echo "`date "+%D %T"` :webserver is inactive">> restart_log.txt
fi

attempt=1
while [ $attempt -le 3 ]
do
echo "`date "+%D %T"` :stop webserver attempt: $attempt">> restart_log.txt
if [ $run_flag -eq 1 ]
then
sudo systemctl stop tomcat8.service
echo "`date "+%D %T"` :stop command executed"
echo "`date "+%D %T"` :stop webserver cmd executed">> restart_log.txt
echo "`date "+%D %T"` :waiting for webserver to stop">> restart_log.txt
#echo "`date "+%D %T"` :waiting for webserver to stop"
sleep 50s
systemctl status tomcat8.service > /home/admin_usr1/status.txt
run_flag=`grep 'active (running)' /home/admin_usr1/status.txt|wc -l`

if [ $run_flag -eq 0 ]
then
sudo systemctl start tomcat8.service
echo "`date "+%D %T"` :start command executed"
echo "`date "+%D %T"` :start tomcat cmd executed, attempt $attempt successful">> restart_log.txt

sleep 50s
systemctl status tomcat8.service > /home/admin_usr1/status.txt
run_flag=`grep 'active (running)' /home/admin_usr1/status.txt|wc -l`

if [ $run_flag -eq 1 ]
then
#echo "`date "+%D %T"` :start tomcat cmd executed, attempt $attempt successful"
echo "`date "+%D %T"` :service is in running state">> restart_log.txt
echo "subject: QA Matillion Services Restarted Successfully"| /usr/sbin/sendmail -f ambarish@abcd.com -t "$recipients"
echo "`date "+%D %T"` :success mail sent">> restart_log.txt
fi

break
else
echo "`date "+%D %T"` :websever not responding, attempt $attempt failed">> restart_log.txt
echo "subject: QA Matillion Services Restart Attemp $attempt failed"| /usr/sbin/sendmail -f ambarish@abcd.com -t "$recipients"
echo "`date "+%D %T"` :failure mail sent">> restart_log.txt

fi
else
echo "`date "+%D %T"` :existing webserver was already in stopped state, restart aborted">> restart_log.txt
#echo "`date "+%D %T"` :existing webserver was already in stopped state, restart aborted"
echo "subject: QA Matillion Services Already In Stop State, Restart Attemp $attempt failed"| /usr/sbin/sendmail -f ambarish@abcd.com -t "$recipients"

echo "`date "+%D %T"` :failure mail sent">> restart_log.txt
break
fi

attempt=`expr $attempt + 1`
done
echo "`date "+%D %T"` ------------------end of execution-----------------">> restart_log.txt

#to edit crontab
#crontab -e
#To check the running logs
#tail -f /home/centos/restart_log.txt

1 comment:

  1. I really like it whenever people come together and share thoughts. Great post, keep it up.

    NoSQL document databases

    ReplyDelete