Data Engineering Journey - no. 06
I have been a Linux user for over 14 years. My first encounter with it was during high school, and later, I delved deeper into it during my university studies. From that moment, I really liked Linux as an operating system, especially because it ran smoothly on my damaged computer without throwing blue screens of death (BSOD), as Windows often did.
The first Linux distribution I installed, initially on a virtual machine and later on a dedicated system partition, was Mandriva
. After that, I moved on to Fedora
, with which I had more extensive experience, followed by Ubuntu, and eventually Arch Linux.
Over time, I experimented with other distributions, including CentOS
, Rocky Linux
, and AlmaLinux
.
When it comes to Linux, there’s a wealth of information available online. Back in the days before Discord, we relied on IRC and online forums dedicated to Linux-related topics. But coming back to the present, I think it’s high time to write some notes about Linux and perhaps share a few recommendations for tools that are most commonly used in this environment, regardless of the distribution you’re working with.
1. What have you learned?
2. What were 2-3 interesting points?
-
Focus on specific commands like grep, which allows for advanced pattern matching in text processing, making it a versatile tool for data filtering.
-
The use of tr for text transformation and cleaning (e.g., case conversion and character removal), which is useful in automation scripts.
-
The combination of multiple commands using pipes (
|
) to create efficient one-liners, like sorting files by size or counting lines that match a specific pattern.
3. What were 2-3 points you didn’t understand?
-
The detailed explanation of advanced grep patterns, such as grep ‘[0-9]{3}-[0-9]{4}’, could be more straightforward. It might be unclear to someone unfamiliar with regex.
-
The distinction between using backticks (
cmd
) and other methods for executing commands in scripts wasn’t fully explained. -
How file redirection with
<
interacts with commands like cat or wc might not be obvious to beginners.
4. Where does this skill or best practice fit?
-
Scripting and Automation: These command-line skills are essential for automating repetitive tasks, such as file management, data processing, and system monitoring.
-
System Administration: Tools like
ls
,grep
,sort
, andwc
are crucial for managing and analyzing logs, directories, and system files. -
Data Analysis and Cleaning: Commands like
cut
,tr
, andgrep
are useful for preprocessing and cleaning data before further analysis in scripts or programs. -
Development Workflows: These skills integrate seamlessly into
CI/CD
pipelines or when working with version control systems to filter and manipulate data.
My site is free of ads and trackers. Was this post helpful to you? Why not
Reference:
Disqus is great for comments/feedback but I had no idea it came with these gaudy ads.