To fetch files from an SFTP location in Linux, we have a requirement to download only the latest (unprocessed) files. As part of the ETL process, we need to identify these unprocessed files based on their date-stamp. First, we fetch all the file names and then proceed to download them. In this explanation, I will outline the implementation of how we can achieve this.
Private/Public key files
First we have to create private and public key files for accessing SFTP server. Why because I do not need to provide any password for accessing that. Linux has a command (ssh_keygen) to create these files. One file will be private key file another will be public key file. Those files will be created in local (client) Linux machine. After creating those, public key file need to deploy to the SFTP server. Just point to be remember, private key file in local machine need to set appropriate permission. It might be “read” permission for all users/groups. Without setting proper permission private key will not work in Linux system. Need more detail please can visit:
Road to fetch
If SFTP location contains all files (including already processed/archived files) but we need to download latest (un-processed) files (specific date range files) and if file name contain date-stamp in that cases we first need to fetch all file names from SFTP location. Based on those file names we can select which files are eligible to download and for only those files we can go for download. It will stop to download unnecessary files and save IO operation and network bandwidth and also increase our application performance. To achieve that we have to create an SFTP interaction process in our bash script.
1 # all sftp file name list from SFTP location and store it to a global file. per line one file name.
2 function get_sftp_file_list() {
3 local sftp_host="${1}" # Host Name
4 local sftp_port="${2}" # Port Number
5 local sftp_user="${3}" # User Name
6 local sftp_source_path="${4}" # SFTP source directory
7 local sftp_identity="${5}" # Private key file path
8 sftp -P "$sftp_port" -i "$sftp_identity" "$sftp_user"@"$sftp_host" << ! >&2> all_file_name_list.txt
9 cd $sftp_source_path
10 ls -1
11 !
12 }
Code language: PHP (php)
In above I created a bash function. Inside that function I use SFTP command. We should now SFTP has few commands for interaction with SFTP service. That commands are not Linux command but few cases it looks similar. But we should not confuse about that.
Inside the function I take few inputs as parameter. Those are:
- Host Name
- Port Number
- SFTP User Name
- Private key file path
Those arguments are mandatory for accessing any SFTP location. Inside the function I use sftp command. (If you need more about sftp command then you can go sftp man page. Just type man sftp in linux command prompt. It will show detail about that.)
The command syntax is:
1 sftp -P "$sftp_port" -i "$sftp_identity" "$sftp_user"@"$sftp_host" << ! >&2> all_file_name_list.txt
2 !
Code language: JavaScript (javascript)
The above code block is self-explanatory therefore explain little.
It start <<! And end with !
It isolates that as a SFTP code block. Inside that block we have to change our directory like:
1 cd $sftp_source_path
Code language: PHP (php)
Then we can execute
1 ls -1
It will retrieve all file names from the current SFTP location and store the file names to the all_file_name_list.txt file. Per line contain one file name. Just point to be remember that few things like cd, ls-1 is also unnecessarily included in that file. You just cut those lines programmatically or ignore those at run-time based on predefined logic.
Here are the additional practice for this
- Discover the advantages of utilizing shell-bash programming for extracting files from an SFTP location.
- Explore alternative methods for obtaining data from an SFTP location.
- Find a valuable resource to delve deeper into shell-bash programming techniques.
Happy coding!!!
Add a Comment