Category Archives: System

Linux平台系统配置,搭建

是谁发送了最多的垃圾邮件

 

新浪反垃圾团队挖掘了一段时间的邮件相关数据,得出了现阶段最活跃的外发垃圾邮件运营商。

数据表明SOHU.COM域对于垃圾邮件控制最弱,他们发送了最多的垃圾邮件。其次是新浪和网易,排在第五名的是Gmail,理论上Google反垃圾技术最强大,跟常识稍稍相悖,一个可行的解释就是Google全球用户,他的邮件系统用户容量远超其他运营商。排在最后的QQ邮箱垃圾邮件最少。

最后列表里面都是邮件系统运营商,并不是说明运营商本身在发送垃圾邮件,而是垃圾邮件营销厂商利用运营商的用户资源外发垃圾。外发账户可以从淘宝购买,这部分账户因为病毒,木门,黑客窃听,第三方网站泄露等多种途径将密码泄露,被黑色产业链收集后打包出售。

外发垃圾邮件排行榜

New common crawl data available

New crawl data is now available!  The data was collected in 2013, contains approximately 2 billion web pages and is 102TB in size (uncompressed). A huge corpus indeed.

The entire Common Crawl data set is stored on Amazon S3 as a Public Data Set: http://aws.amazon.com/datasets/41740

Data Structure

New crawl data is located in the aws-publicdatasets bucket under the base path /common-crawl/crawl-data/ path.

Under this base path, crawl data is organized hierarchically as follows:

  • CRAWL-NAME-YYYY-MM – The name of the crawl and year + week# initiated on
    • segments
      • SEGMENTNAME – A segment directory, typically a unix timestamp
        • warc – contains the WARC files with the HTTP request and responses for each fetch
          • CRAWL-NAME-YYYMMMDDSS-SEQ-MACHINE.warc.gz – individual WAT files
        • wat – contains WARC-encoded WAT files which describe the metadata of each request/response
          • CRAWL-NAME-YYYMMMDDSS-SEQ-MACHINE.warc.wat.gz – individual WAT files
        • wet – contains WARC-encoded WET files with text extractions from the HTTP responses
          • CRAWL-NAME-YYYMMMDDSS-SEQ-MACHINE.warc.wet.gz – individual WAT files

Reducing GC and Faster Memory Allocation To Improve JVM Performance

TwitterNetty is a high-performance NIO (New IO) client server framework for Java that Twitter uses internally as a protocol agonostic RPC system. Twitter found some problems with Netty 3’s memory management for buffer allocations beacause it generated a lot of garbage during operation. When you send as many messages as Twitter it creates a lot of GC pressure and the simple act of zero filling newly allocated buffers consumed 50% of memory bandwidth.

Netty 4 fixes this situation with:

  • Short-lived event objects, methods on long-lived channel objects are used to handle I/O events.
  • Secialized buffer allocator that uses pool which implements buddy memory allocation and slab allocation.

The result:

  • 5 times less frequent GC pauses: 45.5 vs. 9.2 times/min
  • 5 times less garbage production: 207.11 vs 41.81 MiB/s
  • The buffer pool is much faster than JVM as the size of the buffer increases. Some problems with smaller buffers.

Given how many services use the JVM in their messaging infrastructure and how many services have GC related performance problems, this is in impressive result others may want to consider.

For more detail improvement please refer to this ppt

Linux Shell tricks

Send process to background:

'.wch_stripslashes('Ctrl + z').'


Move process to foreground:

'.wch_stripslashes('fg').'


Create an empty file:

'.wch_stripslashes('touch a.file').'


Execute commands from a file in the current shell:

'.wch_stripslashes('source /home/user/file.name').'


Substring for first 5 characters:

'.wch_stripslashes('${variable:0:5}').'


 

SSH with pem key:

'.wch_stripslashes('ssh user@ip_address -i key.pem').'


Get complete directory listing to local directory with wget:

'.wch_stripslashes('wget -r --no-parent --reject "index.html*" http://hostname/ -P /home/user/dirs').'


Recursion create dirs:

'.wch_stripslashes('mkdir -p /home/user/dir1/dir2/dir3').'


Create multiple directories:

'.wch_stripslashes('mkdir -p /home/user/{test,test1,test2}').'


List processes tree with child processes:

'.wch_stripslashes('ps axwef').'


List war | jar file:

'.wch_stripslashes('jar -tf demo1.jar').'


Create war file:

'.wch_stripslashes('jar -cvf name.war file').'


Test disk write speed:

'.wch_stripslashes('dd if=/dev/zero of=/tmp/output.img bs=8k count=256k; rm -rf /tmp/output.img').'


Test disk read speed:

'.wch_stripslashes('hdparm -Tt /dev/sda').'


Get md5 hash from text:

'.wch_stripslashes('echo -n "text" | md5sum').'


Check xml syntax:

'.wch_stripslashes('xmllint --noout file.xml').'


Extract tar.gz in new directory:

'.wch_stripslashes('tar zxvf package.tar.gz -C new_dir').'


Get HTTP headers with curl:

'.wch_stripslashes('curl -I http://www.example.com').'


Modify timestamp of some file or directory (YYMMDDhhmm):

'.wch_stripslashes('touch -t 0712250000 file').'


Generate random password (16 char long in this case):

'.wch_stripslashes('LANG=c < /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c${1:-16};echo;').'


Quickly create a backup of a file:

'.wch_stripslashes('cp some_file_name{,.bkp}').'


Ubuntu no password login:

'.wch_stripslashes('ssh-keygen 
ssh-copy-id not-marco@remote.hosts
then ok :)').'


 

Update date from Ubuntu NTP server:

'.wch_stripslashes('ntpdate ntp.ubuntu.com').'


netstat show all tcp4 listening ports:

'.wch_stripslashes('netstat -lnt4 | awk '{print $4}' | cut -f2 -d: | grep -o '[0-9]*'').'


Convert image from qcow2 to raw:

'.wch_stripslashes('qemu-img convert -f qcow2 -O raw precise-server-cloudimg-amd64-disk1.img \
                                 precise-server-cloudimg-amd64-disk1.raw').'


Run command repeatedly, displaying it’s output (default every two seconds):

'.wch_stripslashes('watch ps -ef').'


List all users:

'.wch_stripslashes('getent passwd').'


Mount root in read/write mode:

'.wch_stripslashes('mount -o remount,rw /').'


Mount a directory (for cases when symlinking will not work):

'.wch_stripslashes('mount --bind /source /destination').'


Send dynamic update to DNS server:

'.wch_stripslashes('nsupdate < <EOF
update add $HOST 86400 A $IP
send
EOF').'


Recursively grep all directories:

'.wch_stripslashes('grep -r "some_text" /path/to/dir').'


List ten largest open files:

'.wch_stripslashes('lsof / | awk '{ if($7 > 1048576) print $7/1048576 "MB "$9 }' | sort -n -u | tail').'


Show free RAM in MB:

'.wch_stripslashes('free -m | grep cache | awk '/[0-9]/{ print $4" MB" }'').'


Open Vim and jump to end of file:

'.wch_stripslashes('vim + some_file_name').'


Git clone specific branch (master):

'.wch_stripslashes('git clone git@github.com:name/app.git -b master').'


Git switch to another branch (develop):

'.wch_stripslashes('git checkout develop').'


Git delete branch (myfeature):

'.wch_stripslashes('git branch -d myfeature').'


Git delete remote branch:

'.wch_stripslashes('git push origin :branchName').'


Git push new branch to remote:

'.wch_stripslashes('git push -u origin mynewfeature').'


Print out the last cat command from history:

'.wch_stripslashes('!cat:p').'


Run your last cat command from history:

'.wch_stripslashes('!cat').'


Find all empty subdirectories in /home/user:

'.wch_stripslashes('find /home/user -maxdepth 1 -type d -empty').'


Get all from line 50 to 60 in test.txt:

'.wch_stripslashes('< test.txt sed -n '50,60p'').'


Run last command (if it was: mkdir /root/test, below will run: sudo mkdir /root/test):

'.wch_stripslashes('sudo !!').'


Create temporary RAM filesystem – ramdisk (first create /tmpram directory):

'.wch_stripslashes('mount -t tmpfs tmpfs /tmpram -o size=512m').'


Grep whole words:

'.wch_stripslashes('grep -w "name" test.txt').'


Append text to a file that requires raised privileges:

'.wch_stripslashes('echo "some text" | sudo tee -a /path/file').'


List all supported kill signals:

'.wch_stripslashes('kill -l').'


Generate random password (16 characters long in this case):

'.wch_stripslashes('openssl rand -base64 16').'


Do not log last session in bash history:

'.wch_stripslashes('kill -9 $$').'


Scan network to find open port:

'.wch_stripslashes('nmap -p 8081 172.20.0.0/16').'


Set git email:

'.wch_stripslashes('git config --global user.email "me@example.com"').'


To sync with master if you have unpublished commits:

'.wch_stripslashes('git pull --rebase origin master').'


Move all files with “txt” in name to /home/user:

'.wch_stripslashes('find -iname "*txt*" -exec mv -v {} /home/user \;').'


Put the file lines side by side:

'.wch_stripslashes('paste test.txt test1.txt').'


Progress bar in shell:

'.wch_stripslashes('pv data.log').'


Send the data to Graphite server with netcat:

'.wch_stripslashes('echo "hosts.sampleHost 10 `date +%s`" | nc 192.168.200.2 3000').'


Convert tabs to spaces:

'.wch_stripslashes('expand test.txt > test1.txt').'


Skip bash history:

'.wch_stripslashes('< space >cmd').'


Go to the previous working directory:

'.wch_stripslashes('cd -').'


Split large tar.gz archive (100MB each) and put it back:

'.wch_stripslashes('split –b 100m /path/to/large/archive /path/to/output/files
cat files* > archive').'


Get HTTP status code with curl:

'.wch_stripslashes('curl -sL -w "%{http_code}\\n" www.example.com -o /dev/null').'


Set root password and secure MySQL installation:

'.wch_stripslashes('/usr/bin/mysql_secure_installation').'


When Ctrl + c not works:

'.wch_stripslashes('Ctrl + \').'


Get file owner:

'.wch_stripslashes('stat -c %U file.txt').'


List block devices:

'.wch_stripslashes('lsblk -f').'


Find files with trailing spaces:

'.wch_stripslashes('find . -type f -exec egrep -l " +$" {} \;').'


Find files with tabs indentation:

'.wch_stripslashes('find . -type f -exec egrep -l $'\t' {} \;').'


Print horizontal line with “=”:

'.wch_stripslashes('printf '%100s\n' | tr ' ' =').'


UPDATE: November 2, 2013