Edit

After a long process of roaming the web, re-runs and troubleshoot the script with this wonderful community, the script is functional and does what it’s intended to do. The script itself is probably even further improvable in terms of efficiency/logic, but I lack the necessary skills/knowledge to do so, feel free to copy, edit or even propose a more efficient way of doing the same thing.

I’m greatly thankful to @[email protected], @[email protected], @[email protected] and Phil Harvey (exiftool) for their help, time and all the great idea’s (and spoon-feeding me with simple and comprehensive examples ! )

How to use

Prerequisites:

  • parallel package installed on your distribution

Copy/past the below script in a file and make it executable. Change the start_range/end_range to your needs and install the parallel package depending on your OS and run the following command:

time find /path/to/your/image/directory/ -type f | parallel ./script-name.sh

This will order only the pictures from your specified time range into the following structure YEAR/MONTH in your current directory from 5 different time tag/timestamps (DateTimeOriginal, CreateDate, FileModifyDate, ModifyDate, DateAcquired).

You may want to swap ModifyDate and FileModifyDate in the script, because ModifyDate is more accurate in a sense that FileModifyDate is easily changeable (as soon as you make some modification to the pictures, this will change to your current date). I needed that order for my specific use case.

From: '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/'

To: '-directory<$DateAcquired/' '-directory<$FileModifyDate/' '-directory<$ModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/'

As per exfitool’s documentation:

ExifTool evaluates the command-line arguments left to right, and latter assignments to the same tag override earlier ones.

#!/bin/bash

if [ $# -eq 0 ]; then
    echo "Usage: $0 <filename>"
    exit 1
fi

# Concatenate all arguments into one string for the filename, so calling "./script.sh /path/with spaces.jpg" should work without quoting
filename="$*"

start_range=20170101
end_range=20201230

FIRST_DATE=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$filename" | tr -d '-' | awk '{print $1}')

if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then
        exiftool -api QuickTimeUTC -d %Y/%B '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename"

else
        echo "Not in the specified time range"

fi



Hi everyone !

Please no bash-shaming, I did my outmost best to somehow put everything together and make it somehow work without any prior bash programming knowledge. It took me a lot of effort and time.

While I’m pretty happy with the result, I find the execution time very slow: 16min for 2288 files.

On a big folder with approximately 50,062 files, this would take over 6 hours !!!

If someone could have a look and give me some easy to understand hints, I would greatly appreciate it.

What Am I trying to achieve ?

Create a bash script that use exiftool to stripe the date from images in a readable format (20240101) and compare it with an end_range to order only images from that specific date range (ex: 2020-01-01 -> 2020-12-30).

Also, some images lost some EXIF data, so I have to loop through specific time fields:

  • DateTimeOriginal
  • CreateDate
  • FileModifyDate
  • DateAcquired

The script in question

#!/bin/bash

shopt -s globstar

folder_name=/home/user/Pictures
start_range=20170101
end_range=20180130


for filename in $folder_name/**/*; do

	if [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal "$filename") =~ ^[0-9]+$ ]]; then
		DateTimeOriginal=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateTimeOriginal "$filename")
	        if  [ "$DateTimeOriginal" -gt $start_range ] && [ "$DateTimeOriginal" -lt $end_range ]; then
			/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename"
			echo "Found a value"
		echo "Okay its $(tput setab 22)DateTimeOriginal$(tput sgr0)"

		fi

        elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -CreateDate "$filename") =~ ^[0-9]+$ ]]; then
                CreateDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -CreateDate "$filename")
                if  [ "$CreateDate" -gt $start_range ] && [ "$CreateDate" -lt $end_range ]; then
                        /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$CreateDate/' '-FileName=%f%-c.%e' "$filename"
                        echo "Found a value"
                echo "Okay its $(tput setab 27)CreateDate$(tput sgr0)"
                fi

        elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -FileModifyDate "$filename") =~ ^[0-9]+$ ]]; then
                FileModifyDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -FileModifyDate "$filename")
                if  [ "$FileModifyDate" -gt $start_range ] && [ "$FileModifyDate" -lt $end_range ]; then
                        /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$FileModifyDate/' '-FileName=%f%-c.%e' "$filename"
                        echo "Found a value"
                echo "Okay its $(tput setab 202)FileModifyDate$(tput sgr0)"
                fi


        elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateAcquired "$filename") =~ ^[0-9]+$ ]]; then
                DateAcquired=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateAcquired "$filename")
                if  [ "$DateAcquired" -gt $start_range ] && [ "$DateAcquired" -lt $end_range ]; then
                        /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateAcquired/' '-FileName=%f%-c.%e' "$filename"
                        echo "Found a value"
                echo "Okay its $(tput setab 172)DateAcquired(tput sgr0)"
                fi

        elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -ModifyDate "$filename") =~ ^[0-9]+$ ]]; then
                ModifyDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -ModifyDate "$filename")
                if  [ "$ModifyDate" -gt $start_range ] && [ "$ModifyDate" -lt $end_range ]; then
                        /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$ModifyDate/' '-FileName=%f%-c.%e' "$filename"
                        echo "Found a value"
                echo "Okay its $(tput setab 135)ModifyDate(tput sgr0)"
                fi

        else
                echo "No EXIF field found"

done

Things I have tried

  1. Reducing the number of if calls

But it didn’t much improve the execution time (maybe a few ms?). The syntax looks way less readable but what I did, was to add a lot of or ( || ) in the syntax to reduce to a single if call. It’s not finished, I just gave it a test drive with 2 EXIF fields (DateTimeOriginal and CreateDate) to see if it could somehow improve time. But meeeh :/.

#!/bin/bash

shopt -s globstar

folder_name=/home/user/Pictures
start_range=20170101
end_range=20201230

for filename in $folder_name/**/*; do

        if [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal "$filename") =~ ^[0-9]+$ ]] || [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -CreateDate "$filename") =~ ^[0-9]+$ ]]; then
                DateTimeOriginal=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateTimeOriginal "$filename")
		CreateDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -CreateDate "$filename")
                if  [ "$DateTimeOriginal" -gt $start_range ] && [ "$DateTimeOriginal" -lt $end_range ] || [ "$CreateDate" -gt $start_range ] && [ "$CreateDate" -lt $end_range ]; then
                        /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateTimeOriginal/' '-directory<$CreateDate/' '-FileName=%f%-c.%e' "$filename"
                        echo "Found a value"
                echo "Okay its $(tput setab 22)DateTimeOriginal$(tput sgr0)"

                else
			echo "FINISH YOUR SYNTAX !!"
		fi

	fi
done

  1. Playing around with find

To recursively find my image files in all my folders I first tried the find function, but that gave me a lot of headaches… When my image file name had some spaces in it, it just broke the image path strangely… And all answers I found on the web were gibberish, and I couldn’t make it work in my script properly… Lost over 4 yours only on that specific issue !

To overcome the hurdle someone suggest to use shopt -s globstar with for filename in $folder_name/**/* and this works perfectly. But I have no idea If this could be the culprit of slow execution time?

  1. Changing all [ ] into [[ ]]

That also didn’t do the trick.

How to Improve the processing time ?

I have no Idea if it’s related to my script or the exiftool call that makes the script so slow. This isn’t that much of a complicated script, I mean, it’s a comparison between 2 integers not a hashing of complex numbers.

I hope someone could guide me in the right direction :)

Thanks !

  • GenderNeutralBro@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    4 months ago

    Glad it’s working! Couple more quick ideas:

    Since you’re looping through the results of find, $file_path will be a single path name, so you don’t need to loop over it with for images in $file_path; anymore.

    I think you’re checking each field of the results in its own if statement, e.g. if [[ $(echo $ALL_DATES | awk '{print $1}')... then if [[ $(echo $ALL_DATES | awk '{print $2}')... etc. While I don’t think this is hurting performance significantly, it would make your code easier to read and maintain if you first found the correct date, and then did only one comparison operation on it.

    For example, exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$file_path" returns five columns, which contain either a date or “-”, and it looks like you’re using the first column that contains a valid date. You can try something like this to grab the first date more easily, then just use that from then on:

    FIRST_DATE=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$file_path" | tr -d '-' | awk '{print $1}')

    tr -d '-' will delete all occurrences of ‘-’. That means the result will only contain whitespace and valid dates, so awk '{print $1}' will print the first valid date. Then you can simply have one if statement:

    if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then

    Hope this helps!

    • N0x0n@lemmy.mlOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      4 months ago

      Update

      I found something interesting ! It seems that the tag FileModifyDate is not being processed in the script! After removing all time tags except FileModifyDate the file is not even processed, it directly goes to the else statement ! I’m still digging :D


      Hey again !

      Thank you very much for sharing your knowledge and ELI !

      I gave it a try and while I understand what it does (thanks to your concise and easy to understand examples) and how it should be implemented exiftool seems to behave very strangely (maybe a bug, but I guess skill issue).

      FIRST_DATE=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$file_path" | tr -d '-' | awk '{print $1}')
      
      if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then
      
      	/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -d %Y/%B '-directory<$DateTimeOriginal/' '-directory<$CreateDate/' '-directory<$FileModifyDate/' '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-FileName=%f%-c.%e' "$file_path"
      

      As per the exiftool documentation (source example 12)

      -directory<$DateTimeOriginal/' '-directory<$CreateDate/' '-directory<$FileModifyDate/' '-directory<$DateAcquired/' '-directory<$ModifyDate/' 
      

      In this command the Directory tag is set from multiple other date/time tags. ExifTool evaluates the command-line arguments left to right, and latter assignments to the same tag override earlier ones, so the Directory for each image is ultimately set by the rightmost copy argument that is valid for that image.

      However, it sometimes skip the DateTimeOriginal field and takes FileModifyDate instead even if the first one is present. My guess is that exiftool needs more time to correctly process the file, but it’s only a guess ! Because with the for loop and all elif calls it works without any issues.

      Thanks again for your insightful help :)


      Side note: I gave a test run with only one time field to see If there is any time gain with calling only the first valid date, while it seems ~2ms slower per file I think it would really make the difference without all the elif calls on the long run !!

      Thanks again !!

      • GenderNeutralBro@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        4 months ago

        Off the top of my head I’m not sure why that would be. To troubleshoot, it might help to print the output every step of the way so you can see if there are any oddities. Something like this perhaps, in place of the FIRST_DATE= line.

        echo $file_path
        EXIF_OUT=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$file_path")
        echo "$EXIF_OUT"
        EXIF_FILTERED=$(echo "$EXIF_OUT" | tr -d '')
        echo "$EXIF_FILTERED"
        FIRST_DATE=$(echo "$EXIF_FILTERED" | awk '{print $1}')
        echo "$FIRST_DATE"
        
        • N0x0n@lemmy.mlOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          4 months ago

          Heyho :)

          Back again ! Thanks you for helping troubleshoot !!! It’s was actually something else… (my reading skills) !

          ExifTool evaluates the command-line arguments left to right, and latter assignments to the same tag override earlier ones.

          That’s why everything was messed up, because it took the last assignment to write the directory date… I feel quite stupid/bad and added unnecessarily noise to Phil Harvey’s forum :/.


          Thanks to you and another user I greatly improved my script and went from 16min to 1min21s for the exact same batch ! He/she proposed to use the parallel package alongside the script to make full use of my CPU cores ! Also, another way to loop through my files. It’s a mix of both your ideas and it looks so much better ! Thank you very much for your help ! 😘

          #!/bin/bash
          
          if [ $# -eq 0 ]; then
              echo "Usage: $0 <filename>"
              exit 1
          fi
          
          # Concatenate all arguments into one string for the filename, so calling "./script.sh /path/with spaces.jpg" should work without quoting
          filename="$*"
          
          start_range=20170101
          end_range=20201230
          
          FIRST_DATE=$(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$filename" | tr -d '-' | awk '{print $1}')
          #echo $FIRST_DATE
          
          if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then
                  /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -d %Y/%B '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename"
          
          else
                  echo "Error"
          
          fi
          

          time find /home/user/Pictures/ -type f | parallel ./exif-test.bash