Linux 下的 S.M.A.R.T.

S.M.A.R.T. 是 Self – Monitoring, Analysis and Reporting Technology (硬碟自我監測、分析及報告技術) 的縮寫,可以在硬碟發生故障之前,先提出警訊,可以提早因應。
在 Linux 下只要安裝 smartmontools 套件即可
# rpm -qf /usr/sbin/smartctl
smartmontools-5.42-2.el5

啟動 smartd 服務
# chkconfig –level 3 smartd on
# chkconfig –list | grep smartd
smartd          0:關閉  1:關閉  2:關閉  3:開啟  4:關閉  5:關閉  6:關閉
[@more@]檢查硬碟是否有開啟 smart 功能
# smartctl -i /dev/sda
smartctl 5.42 2011-10-20 r3458 [i686-linux-2.6.18-8.1.10.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10
Device Model:     ST3160815AS
Serial Number:    5RA19L2P
Firmware Version: 3.AAD
User Capacity:    160,040,803,840 bytes [160 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Tue Oct 22 16:14:33 2013 CST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

如果有沒開啟
# smartctl -s on /dev/sda
smartctl 5.42 2011-10-20 r3458 [i686-linux-2.6.18-8.1.10.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF ENABLE/DISABLE COMMANDS SECTION ===
SMART Enabled.

檢查硬碟狀況
# smartctl -H /dev/sda
smartctl 5.42 2011-10-20 r3458 [i686-linux-2.6.18-8.1.10.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

不過這邊出現 PASSED 並不代表一定沒有問題,要進一步檢查

進行快速檢查,大概需要 1 分多鐘
# smartctl -t short /dev/sda
smartctl 5.42 2011-10-20 r3458 [i686-linux-2.6.18-8.1.10.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: “Execute SMART Short self-test routine immediately in off-line mode”.
Drive command “Execute SMART Short self-test routine immediately in off-line mode” successful.
Testing has begun.
Please wait 1 minutes for test to complete.
Test will complete after Tue Oct 22 16:19:21 2013

Use smartctl -X to abort test.

# smartctl -l selftest /dev/sda
smartctl 5.42 2011-10-20 r3458 [i686-linux-2.6.18-8.1.10.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     46396         3103025
# 2  Short offline       Completed: read failure       90%     46387         3103025
# 3  Short offline       Completed: read failure       90%     46387         3103025
# 4  Short offline       Completed: read failure       90%     45099         3103019

由上面可以看出有 read faillure 的狀況,應該趕快備份資料了!

# smartctl -l selftest /dev/sdb
smartctl 5.42 2011-10-20 r3458 [i686-linux-2.6.18-8.1.10.el5] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     46426         –
# 2  Short offline       Completed without error       00%     46418         –

上面這一台,是沒有問題的。

更詳細的使用參數
# smartctl –help

安裝好 Linux Server 後畫面一直出現的訊息

最近因為之前的 Linux Server 硬碟似乎有些問題,所以安裝了硬碟,重新安裝作業系統,原本這一台 lvnovo Server 在 CentOS 5.x 時,會因為硬碟介面卡比較特別,所以會抓不到硬碟,所以用了一些很特殊的方式,才安裝好 CentOS 5,害我一直不敢升級核心及作業系統到 CentOS 6.x。
不過,在 CentOS 6.x 之後,似乎已經解決掉這個問題了,整個安裝都沒有問題,但安裝完之後,螢幕上卻一直出現如下的訊息:
drivers/hid/usbhid/hid-core.c: can’t reset device, 0000:00:1d.0-1.1.3/input0, status -71

在 Console 和遠端一直出現,讓我根本無法操作,原本以為是我安裝的問題,又重新安裝了一次,問題還是一樣,於是就到網路上找了一下,果然也有人和我一樣,碰到相同的問題。[@more@]解決方式:
# echo -1 >/sys/module/usbcore/parameters/autosuspend

加入到 /etc/rc.d/rc.local
# vim /etc/rc.d/rc.local
echo -1 >/sys/module/usbcore/parameters/autosuspend

後記:
後來發現一直出現錯誤訊息的原因,有可能是 Server 上同時接著 USB 和 PS2 的鍵盤,移除 USB 的鍵盤後,似乎就恢復正常了!

二台服役最久的伺服器終於掛點

學校的伺服器中,唯一只有二台是使用 SCSI 硬碟的伺服器,在服務 10 多年後,終於掛點了。
一台是放置學校全校授權光碟的伺服器,一台是放置一些舊的班級網頁的伺服器,不過還好的是,授權光碟伺服器的光碟 ISO 檔,剛做完備份不久,而放置舊的班級網頁那一台,則已經出現狀況很久了,在預料之中。

感覺這好像是人生一樣,總是有那些階段,也許該來的總是會來。

Snort Rule 更新

原本用來擔任入侵偵測系統的主機,是透過 oinkmaster 來更新 Snort Rule,不過最近(其實應該有一段時間了,只是自己懶惰,沒有積極處理),常常會在信箱收到如下的錯誤訊息:

 http://www.snort.org/pub-bin/oinkmaster.cgi/*oinkcode*/snortrules-snapshot-2860.tar.gzResolving www.snort.org… 23.23.143.164
Connecting to www.snort.org|23.23.143.164|:80… connected.
HTTP request sent, awaiting response… 403 Forbidden
2013-09-07 23:30:03 ERROR 403: Forbidden.

猜想可能是 Snort Rule 下載的路徑已經做了更改,所以登入 Snort 官方網站,終於找到了解決方式:[@more@]

修改 /etc/snort/oinkmaster.conf(路徑可能會依安裝的方式而有不同)
# vim /etc/snort/oinkmaster.conf
url = http://www.snort.org/pub-bin/oinkmaster.cgi/<oinkcode here>/snortrules-snapshot-2931.tar.gz

紅字的部分就是 oinkcode 碼

進行測試
# /usr/local/bin/oinkmaster.pl -C /etc/snort/oinkmaster.conf -o /etc/snort/rules/
Loading /etc/snort/oinkmaster.conf
Downloading file from http://www.snort.org/pub-bin/oinkmaster.cgi/*oinkcode*/snortrules-snapshot-2931.tar.gz…



  -> protocol-ftp.rules
    -> protocol-icmp.rules
    -> protocol-imap.rules
    -> protocol-nntp.rules
    -> protocol-pop.rules
    -> protocol-rpc.rules
    -> protocol-scada.rules
    -> protocol-services.rules
    -> protocol-snmp.rules
    -> protocol-telnet.rules
    -> protocol-tftp.rules
    -> protocol-voip.rules
    -> pua-adware.rules
    -> pua-other.rules
    -> pua-p2p.rules
    -> pua-toolbars.rules
    -> server-apache.rules
    -> server-iis.rules
    -> server-mail.rules
    -> server-mssql.rules
    -> server-mysql.rules
    -> server-oracle.rules
    -> server-other.rules
    -> server-samba.rules
    -> server-webapp.rules

OK,收工了!