Do you like this page?



Views of this page
325

Views of the site
10524


Please help share my site. Thank you!







Linux Basic Commands for Beginners - Easy Microbial Genomics

Linux Basic Commands for Beginners

Last updated: April 21, 2022

(It applies to MacOS users too)

In this section, I will introduce the most frequently used Linux commands, especially in genomics and bioinformatics analysis. Linux, unlike Microsoft's Windows and Apple's MacOS, is an open-source system, which means that users can access to its source codes and participate in the development. Some common distributions include Ubuntu, Fedora, CentOS, and Debian. In this tutorial, I recommend using Ubuntu because it is a stable, long supported, and beginner-friendly system with a variety of tutorials available. Note that the commands are mostly the same between Linux and MacOS.

❤ The secret to learn Linux commands well is to use them and ask Google when you have questions. You can google 'linux command [your task]'.

Installing Ubuntu under Windows

Skip to the next part if you are using MacOS. If you are using the latest Windows 10 or 11, you can use wsl to install Ubuntu as an app. Simply open the command prompt as administrator, run 'wsl --install -d Ubuntu' and follow the guide. No need to isolate a partition, format the disk, and mount. For more information, please see here.

Basic commands

Here are some mostly used commands. For any command, run 'man [command_name]' to read the manual. For example, 'man man'. Note that 'man' is also a command.

ls

Probably 'ls' is the most common command in Linux. It lists all files and directories in the current or specified location.

ls [directory_name]

The option '-l' results in details, including property, owner, group, size, modification date, name, of the file or directory. The option '-a' list all files or directories including hidden ones (with names starting with a dot). The option '-h' transfer the size in byte to human readable format, such as MB.

ls -lah

Example result:

drwxr-xr-x 2 root root 4.0K Apr 18 21:09 .i_am_invisible

cd

Change to a particular directory

cd [directory_name]

Move one level up

cd ..

Go to your home directory

cd ~

more / less

View a file page by page, press 'q' to quit.

more [file_name]

cat

Display the whole content of a file

cat [file_name]

For zipped files (e.g. 1.gz)

zcat [file_name]

Join two files (file1, file2) and stores the output in a new file (file3)

cat [file1] [file2] > [file3]

mkdir

Make a new directory

mkdir [directory_name]

Use the option '-p' to silent the warning when there is already a directory with the same name

mkdir -p [directory_name]

mv

Move a file or a directory to a new location

mv [file_name/directory_name] [new_path]

Rename a file or a directory to a new name

mv [file_name/directory_name] [new_name]

rm

Remove a file or a directory

rm [file_name]

Use the option '-r' to delete directories. Use the option '-f' to force the deletion.

rm -rf [directory_name]

gzip

zip a file

gzip [file_name]

Use the option '-d' to unzip a file.

gzip -d [file_name]

Pack and zip a directory

tar -zcf [zip_file_name] [directory_name]

unzip a .tar.gz file

tar -zcxf [zip_file_name]

Environment variables

In Linux system, a value can be store in a variable, and you can get the value somewhere else by calling that variable.

Assign a value to a variable

[variable]=[value]

Display the value of a variable

echo $[variable]

Unset a variable

unset [variable]

Export an environment variable

export [variable]=[value]

User management 👍 You are the boss!

Add a new user

sudo adduser [username]

sudo means running as a super user.

Change the password of a user

sudo passwd -l [username]

Remove a user

sudo userdel -r [username]

Add a user to a group

sudo usermod -a -G [groupname] [username]

Networking

Login into a remote Linux machine using SSH

ssh [username]@[ip-address or hostname]

Copy files or directories from a remote machine

scp -r [username]@[ip-address or hostname]:[path of file or directory] [path of current machine]

Copy files or directories from current system to a remote machine

scp -r [file or directory of current machine] [username]@[ip-address or hostname]:[path of remote machine]

Process management

Check details on all active processes

top

If you want to only show processes of a specific user

top -u [username]

Press 'c' if you want to see the commands of the processes.

Show the processes of current user with commands

ps -h

Kill a process with process ID (PID)

kill [PID]

If you want to make a task run in the background so you can continue with other tasks, add a '&' at the end.

bash 1.sh &

If you want to keep a task running even after you close the session, add 'nohup' at the beginning.

nohup bash 1.sh &

👏 No worry. It will keep going.

Iteration

for i in {1..9}; do echo $i; done

for i in `ls`; do echo $i; done # operate on all the files in the folder

Storage and memory

Check storage status of the hard disk

df -h

Check file and directory sizes

du -d 2 -h ./

'-d' specifies the depth of directory. '-h' gives sizes in human readable format.

Find big files (e.g. > 1GB) and their details (e.g. owners). 👏 This is useful in clearing the storage of a machine.

find ./ -size +1G | xargs ls -lh

Find big files (e.g. > 1GB) and delete them

find ./ -size +1G -delete

Check memory status

free -h

Some tricks in genomics analysis

Renaming FASTA files in batch

Sometimes we need to rename many sequence files. We cannot afford changing 1000 files one by one, so we have to do it in batch.

For example, renaming all sequence files ending in '.fasta' in the directory to '.fna'

for i in `ls *fasta`; do j=`echo $i | sed 's/\.fasta$/.fna/'`; echo $i $j; mv $i $j; done

Removing suffix of FASTQ files (e.g. sample1_S1_L001_R1_001.fastq.gz to sample1_R1.fastq.gz)

for i in `ls *fastq.gz`; do j=`echo $i | sed -Ee 's/(.*)_S1_L001_(R[12])_001.fastq.gz/\1_\2.fastq.gz/'`; echo $i $j; mv $i $j; done



❤ Well done. Keep using the commands and you will remember them easily.