Data Engineering Journey - no. 06 | Hands On Programming

I have been a Linux user for over 14 years. My first encounter with it was during high school, and later, I delved deeper into it during my university studies. From that moment, I really liked Linux as an operating system, especially because it ran smoothly on my damaged computer without throwing blue screens of death (BSOD), as Windows often did.

The first Linux distribution I installed, initially on a virtual machine and later on a dedicated system partition, was Mandriva. After that, I moved on to Fedora, with which I had more extensive experience, followed by Ubuntu, and eventually Arch Linux.

Over time, I experimented with other distributions, including CentOS, Rocky Linux, and AlmaLinux.

When it comes to Linux, there’s a wealth of information available online. Back in the days before Discord, we relied on IRC and online forums dedicated to Linux-related topics. But coming back to the present, I think it’s high time to write some notes about Linux and perhaps share a few recommendations for tools that are most commonly used in this environment, regardless of the distribution you’re working with.

1. What have you learned?

# List files and directories
ls -l       # List files and directories with details
ls -a       # List all files and directories, including hidden ones

# Directory operations
mkdir       # Create a new directory
rmdir       # Remove an empty directory
mv          # Move or rename files and directories
cd          # Change the current working directory

# Text editing
gedit       # Opens a file in a graphical text editor

# Text processing
head -n     # Display the first n lines of a file
tail -n     # Display the last n lines of a file
tr          # Transform or translate text

# Example 1: Convert uppercase to lowercase and vice versa
echo "Hello World" | tr "A-Za-z" "a-zA-Z"
# Output: hELLO wORLD

# Example 2: Remove specific characters from text
echo "wiki wiki" | tr -d "ki"
# Output: w w

# Extract text fields
cut

# Example: Show files from largest to smallest
ls -l | tr -s " " | cut -f 5,9 -d " " | sort -n
# Explanation:
# - `tr -s " "`: Squeeze multiple spaces into one
# - `cut -f 5,9 -d " "`: Select fields 5 and 9 (size and name)
# - `sort -n`: Sort numerically

# Sorting and unique values
sort        # Sort lines of text
uniq        # Filter out duplicate lines

# Search for patterns in text
grep        # Select text lines matching a pattern

# Example: Count files in a directory
ls -l | grep "^-" | wc -l
# - `grep "^-": Matches lines starting with '-' (files)
# - `wc -l`: Count lines

# Common grep patterns:
grep 'smug' files        # Search for lines containing 'smug'
grep '^smug' files       # Lines starting with 'smug'
grep 'smug$' files       # Lines ending with 'smug'
grep '^smug$' files      # Lines containing only 'smug'
grep '\^s' files         # Lines starting with '^s' ('^' escaped)
grep '[Ss]mug' files     # Matches 'Smug' or 'smug'
grep 'B[oO][bB]' files   # Matches 'BOB', 'Bob', 'BOb', or 'BoB'
grep '^$' files          # Matches blank lines
grep '[0-9][0-9]' file   # Matches pairs of numeric digits
grep '[a-zA-Z]'          # Lines with at least one letter
grep '[^a-zA-Z0-9]'      # Lines with non-alphanumeric characters
grep '[0-9]\{3\}-[0-9]\{4\}' # Matches phone numbers like 999-9999
grep '^.$'               # Lines with exactly one character
grep '"smug"'            # 'smug' within double quotes
grep '^\.'               # Lines starting with a period '.'
grep '^\.[a-z][a-z]'     # Lines starting with '.' followed by 2 lowercase letters

# File operations
cat          # Display file content
wc           # Word, line, and character count
wc -l        # Count lines
wc -w        # Count words
wc -c        # Count characters

# Redirecting output
(cmd) > (file)   # Store output in a file (overwrite)
(cmd) >> (file)  # Append output to a file
< (file)         # Read input from a file

# Variables and arithmetic
a=5
b=3
echo $a          # Prints 5
echo $[a*b]      # Prints 15

# Strings and commands
"something"      # Denotes a string
`cmd`            # Executes a command

2. What were 2-3 interesting points?

Focus on specific commands like grep, which allows for advanced pattern matching in text processing, making it a versatile tool for data filtering.
The use of tr for text transformation and cleaning (e.g., case conversion and character removal), which is useful in automation scripts.
The combination of multiple commands using pipes (|) to create efficient one-liners, like sorting files by size or counting lines that match a specific pattern.

3. What were 2-3 points you didn’t understand?

The detailed explanation of advanced grep patterns, such as grep ‘[0-9]{3}-[0-9]{4}’, could be more straightforward. It might be unclear to someone unfamiliar with regex.
The distinction between using backticks (cmd) and other methods for executing commands in scripts wasn’t fully explained.
How file redirection with < interacts with commands like cat or wc might not be obvious to beginners.

4. Where does this skill or best practice fit?

Scripting and Automation: These command-line skills are essential for automating repetitive tasks, such as file management, data processing, and system monitoring.
System Administration: Tools like ls, grep, sort, and wc are crucial for managing and analyzing logs, directories, and system files.
Data Analysis and Cleaning: Commands like cut, tr, and grep are useful for preprocessing and cleaning data before further analysis in scripts or programs.
Development Workflows: These skills integrate seamlessly into CI/CD pipelines or when working with version control systems to filter and manipulate data.

My site is free of ads and trackers. Was this post helpful to you? Why not

Reference: