AernaLingus [any]

  • 0 Posts
  • 26 Comments
Joined 3 years ago
cake
Cake day: May 6th, 2022

help-circle

  • From 2008 to 2011, Li made CRACK99 a reliable black-market marketplace, one that netted an estimated $100 million in sales. His inventory, investigators later said, was valued at over $1 billion.

    Since it’s not clear from this write-up, those eye-popping figures (the ones concocted by the Department of Justice) are derived from the prices that the licenses were being sold for by the original companies, so it’s not $100 million in sales but $100 million in “value” (the idea of calculating a $1 billion valuation for the digital “inventory” is even more ridiculous). If you look on the actual crack99 website, you’ll see that most of the cracked software was being sold for anywhere from twenty bucks to maybe a few hundred dollars—this guy was not making millions from this. The government’s sentencing memorandum has the details; this includes the absurd figure of $3,812,241.57 for a single software license of some CAD software called “Catia VR520”, which Li sold to at least one other customer for the princely sum of $100.





  • I’ll preface this by saying I’m working my way through the Rust book, too–just a bit further along–so don’t take my word as gospel.

    This exact scenario is what the ? operator was designed for: returning early with the Err if one is received[1], otherwise unpacking the Ok. As you’ve discovered, it’s a common pattern, so using the ? operator greatly cuts down on the boilerplate code. If you wanted to do the equivalent of you have here (panicking instead of returning the Err for it to potentially be handled in calling code, albeit without your custom panic messages[2]) you could achieve this with unwrap() instead of ?:

    let html_content_text = reqwest::blocking::get(&permalink).unwrap().text().unwrap();
    

    Both of these will be covered in chapter 9.

    If you want to avoid those constructs until later, the only thing I’d say is that some of the intermediate variables seem unnecessary since you can match on the function call directly:

    fn get_document(permalink: String) -> Html {
            let html_content = match reqwest::blocking::get(&permalink) {
                Ok(response) => response,
                Err(error) => panic!("There was an error making the request: {:?}", error),
            };
    
            let html_content_text = match html_content.text() {
                Ok(text) => text,
                Err(error) =>
                    panic!(
                        "There was an error getting the html text from the content of response: :{:?}",
                        error
                    ),
            };
    
            let document = Html::parse_document(&html_content_text);
    
            document
        }
    

    You could also eliminate the final let statement and just stick the parse_document call at the end, but that’s a matter of preference–I know having an intermediate variable before a return can sometimes make debugging easier.

    As for whether you should build something now or wait till you learn more–go with your gut! The most important thing is that you stay actively engaged with the material, and many people find diving into projects as soon as possible helps them learn and stay motivated. You could also use rustlings and/or Rust by Example as you go through the book, which is what I’ve been doing (specifically rustlings). It’s not as stimulating as writing a project from scratch, but it does let you write some relevant code. And if you’re not already, I highly recommend using the Brown version of the Rust Book which includes interactive quizzes sprinkled throughout. I’ve found them particularly helpful for understanding the quirks of the borrow checker, which is a topic it continues to revist throughout the book.


    1. There’s also some type coercion, but that’s beyond the scope of your question ↩︎

    2. edit: you can use expect to get the custom messages as covered in another comment–not sure how I forgot that ↩︎





  • In text form:

    Abstract

    Amid the current U.S.-China technological race, the U.S. has imposed export controls to deny China access to strategic technologies. We document that these measures prompted a broad-based decoupling of U.S. and Chinese supply chains. Once their Chinese customers are subject to export controls, U.S. suppliers are more likely to terminate relations with Chinese customers, including those not targeted by export controls. However, we find no evidence of reshoring or friend-shoring. As a result of these disruptions, affected suppliers have negative abnormal stock returns, wiping out $130 billion in market capitalization, and experience a drop in bank lending, profitability, and employment.

    Quote from conclusion

    Moreover, the benefits of U.S. export controls, namely denying China access to advanced technology, may be limited as a result of Chinese strategic behavior. Indeed, there is evidence that, following U.S. export controls, China has boosted domestic innovation and self-reliance, and increased purchases from non-U.S. firms that produce similar technology to the U.S.-made ones subject to export controls.


  • Yeah I know that feeling, I posted and add unnecessary noise to Phil Harvey’s forum about something I though was a “bug” or odd behavior with EXIF-tool, while it’s was just my lacking reading skills… I felt so dumb :/

    Happens to the best of us! As long as you make a genuine effort to find a solution, I think most people will be happy to help regardless.

    As for the version of the unique name code you wrote, you got the spirit! The problem is that the try block will only catch the exception the first time around, so if there are two duplicates the uncaught exception will terminate the script. Separately, when working with exceptions it’s important to be mindful of which particular exceptions the code in the try block might throw and when. In this case, if the move is to another directory in the same filesystem, shutil.move will match the behavior of os.rename which throws different types of exceptions depending on what goes wrong and what operating system you’re on. Importantly, on Windows, it will throw an exception if the file exists, but this will not generally occur on Unix and the file will be overwritten silently.

    (actually, I just realized that this may be an issue with pasting in your Python code messing up the indentation–one of the flaws of Python. If this was your actual code, I think it would work:)

            try:
              shutil.move(d['SourceFile'], subdirectory)
            except:
              i = 0
              while os.path.exists(d['SourceFile']):
                i += 1
                base_name, extension = os.path.splitext(d['SourceFile'])
                new_filename = f"{base_name}-{i}{extension}"
              print(new_filename)
              os.rename(d['SourceFile'], new_filename)
              shutil.move(new_filename, subdirectory)
    

    (oh, and I should have mentioned this earlier, but: for Markdown parsers that support it (including Lemmy and GitHub) if you put the name of the language you’re writing in after your opening triple ` (e.g. ```python or ```bash) it’ll give you syntax highlighting for that language (although not as complete as what you’d see in an actual code editor))

    Really cool that you figured out how to do it with exiftool natively–I’ll be honest, I probably wouldn’t have persevered enough to come up with that had it been me! Very interesting that it ended up being slower than the Python script, which I wouldn’t have expected. One thing that comes to mind is that my script more or less separates the reads and writes: first it reads all the metadata, then it moves all the files (there are also reads to check for file existence in the per-file operations, but my understanding is that this happens in compact contiguous areas of the drive and the amount of data read is tiny). If exiftool performs the entire operation for one file at a time, it might end up being slower due to how storage access works.


    Happy to have been able to help! Best of luck to you.


  • Wow, nice find! I was going to handle it by just arbitrarily picking the first tag which ended with CreateDate, FileModifyDate, etc., but this is a much better solution which relies on the native behavior of exiftool. I feel kind of silly for not looking at the documentation more carefully: I couldn’t find anything immediately useful when looking at the documentation for the class used in the script (ExifToolHelper) but with the benefit of hindsight I now see this crucial detail about its parameters:

    All other parameters are passed directly to the super-class constructor: exiftool.ExifTool.__init__()

    And sure enough, that’s where the common_args parameter is detailed which handles this exact use case:

    common_args (list of str*, or* None.) –

    Pass in additional parameters for the stay-open instance of exiftool.

    Defaults to ["-G", "-n"] as this is the most common use case.

    • -G (groupName level 1 enabled) separates the output with groupName:tag to disambiguate same-named tags under different groups.

    • -n (print conversion disabled) improves the speed and consistency of output, and is more machine-parsable

    Passed directly into common_args property.

    As for the renaming, you could handle this by using os.path.exists as with the directory creation and using a bit of logic (along with the utility functions os.path.basename and os.path.splitext) to generate a unique name before the move operation:

    # Ensure uniqueness of path
    basename = os.path.basename(d['SourceFile'])
    filename, ext = os.path.splitext(basename)
    count = 1        
    while os.path.exists(f'{subdirectory}/{basename}'):
      basename = f'{filename}-{count}{ext}'
      count += 1
    
    shutil.move(d['SourceFile'], f'{subdirectory}/{basename}')
    

  • Alright, here’s what I’ve got!

    #!/usr/bin/env python3
    
    import datetime
    import glob
    import os
    import re
    import shutil
    
    import exiftool
    
    
    files = glob.glob(r"/path/to/photos/**/*", recursive=True)
    # Necessary to avoid duplicate files; if all photos have the same extension
    # you could simply add that extension to the end of the glob path instead
    files = [f for f in files if os.path.isfile(f)]
    
    parent_dir = r'/path/to/sorted/photos'
    start_date = datetime.datetime(2015, 1, 1)
    end_date = datetime.datetime(2024, 12, 31)
    date_extractor = re.compile(r'^(\d{4}):(\d{2}):(\d{2})')
    
    with exiftool.ExifToolHelper() as et:
      metadata = et.get_metadata(files)
      for d in metadata:
        for tag in ["EXIF:DateTimeOriginal", "EXIF:CreateDate",
                    "File:FileModifyDate", "EXIF:ModifyDate",
                    "XMP:DateAcquired"]:
          if tag in d.keys():
            # Per file logic goes here
            year, month, day = [int(i) for i in date_extractor.match(d[tag]).group(1, 2, 3)]
            filedate = datetime.datetime(year, month, day)
            if filedate < start_date or filedate > end_date:
              break
            
            # Can uncomment below line for debugging purposes
            # print(f'{d['File:FileName']} {d[tag]} {year}/{month}')
            subdirectory = f'{parent_dir}/{year}/{month}'
            if not os.path.exists(subdirectory):
              os.makedirs(subdirectory)
    
            shutil.move(d['SourceFile'], subdirectory)
            
            break
    

    Other than PyExifTool which will need to be installed using pip, all libraries used are part of the standard library. The basic flow of the script is to first grab metadata for all files using one exiftool command, then for each file to check for the existence of the desired tags in succession. If a tag is found and it’s within the specified date range, it creates the YYYY/MM subdirectory if necessary, moves the file, and then proceeds to process the next file.

    In my preliminary testing, this seemed to work great! The filtering by date worked as expected, and when I ran it on my whole test set (831 files) it took ~6 seconds of wall time. My gut feeling is that once you’ve implemented the main optimization of handling everything with a single execution of exiftool, this script (regardless of programming language) is going to be heavily I/O bound because the logic itself is simple and the bulk of time is spent reading and moving files, meaning your drive’s speed will be the key limiting factor. Out of those 6 seconds, only half a second was actual CPU time. And it’s worth keeping in mind that I’m doing this on a speedy NVME SSD (6 GB/s sequential read/write, ~300K IOPS random read/write), so it’ll be slower on a traditional HDD.

    There might be some unnecessary complexity for some people’s taste (e.g. using the datetime type instead of simple comparisons like in your bash script), but for something like this I’d prefer it to be brittle and break if there’s unexpected behavior because I parsed something wrong or put in nonsensical inputs rather than fail silently in a way I might not even notice.

    One important caveat is that none of my photos had that XMP:DateAcquired tag, so I can’t be certain that that particular tag will work and I’m not entirely sure that will be the tag name on your photos. You may want to run this tiny script just to check the name and format of the tag to ensure that it’ll work with my script:

    #!/usr/bin/env python3
    
    import exiftool
    import glob
    import os
    
    
    files = glob.glob(r"/path/to/photos/**/*", recursive=True)
    # Necessary to avoid duplicate files; if all photos have the same extension
    # you could simply add that extension to the end of the glob path instead
    files = [f for f in files if os.path.isfile(f)]
    with exiftool.ExifToolHelper() as et:
      metadata = et.get_metadata(files)
      for d in metadata:
        if "XMP:DateAcquired" in d.keys():
          print(f'{d['File:FileName']} {d[tag]}')
    

    If you run this on a subset of your data which contains XMP-tagged files and it correctly spits out a list of files plus the date metadata which begins YYYY:MM:DD, you’re in the clear. If nothing shows up or the date format is different, I’d need to modify the script to account for that. In the former case, if you know of a specific file that does have the tag, it’d be helpful to get the exact tag name you see in the output from this script (I don’t need the whole output, just the name of the DateAcquired key):

    #!/usr/bin/env python3
    
    import exiftool
    import json
    
    
    with exiftool.ExifToolHelper() as et:
      metadata = et.get_metadata([r'path/to/dateacquired/file'])
      for d in metadata:
        print(json.dumps(d, indent=4))
    

    If you do end up using this, I’ll be curious to know how it compares to the parallel solution! If the exiftool startup time ends up being negligible on your machine I’d expect it to be similar (since they’re both ultimately I/O bound, and parallel saves time by being able to have some threads executing while others are waiting for I/O), but if the exiftool spin-up time constitutes a significant portion of the execution time you may find it to be faster! If you don’t end up using it, no worries–it was a fun little exercise and I learned about a library that will definitely save me some time in the future if I need to do some EXIF batch processing!


  • Yeah, I think the fact that you need to capture the output and then use that as input to another exiftool command complicates things a lot; if you just need to run an exiftool command on each photo and not worry about the output I think the -stay_open approach would work, but I honestly have no idea how you would juggle the input and output in your case.

    Regardless, I’m glad you were able to see some improvement! Honestly, I’m the wrong person to ask about bash scripts, since I only use them for really basic stuff. There are wizards who do all kinds of crazy stuff with bash, which is incredibly useful if you’re trying to create a portable tool with no dependencies beyond any binaries it may call. But personally, if I’m just hacking myself together something good enough to solve a one-off problem for myself I’d rather reach for a more powerful tool like Python which demands less from my puny brain (forgive my sacrilege for saying this in a Bash community!). Here’s an example of how I might accomplish a similar task in Python using a wrapper around exiftool which allows me to batch process all the files in one go and gives me nice structured data (dictionaries, in this case) without having to do any text manipulation:

    import exiftool
    import glob
    
    files = glob.glob(r"/path/to/photos/**/*", recursive=True)
    with exiftool.ExifToolHelper() as et:
      metadata = et.get_metadata(files)
      for d in metadata:
        for tag in ["EXIF:DateTimeOriginal", "EXIF:CreateDate", "File:FileCreateDate", "File:FileModifyDate", "EXIF:DateAcquired"]:
          if tag in d.keys():
            # Per file logic goes here
            print(f'{d["File:FileName"]} {d[tag]}')
            break
    

    This outline of a script (which grabs the metadata from all files recursively and prints the filename and first date tag found for each) ran in 4.2 s for 831 photos on my machine (so ~5 ms per photo).

    Since I’m not great in bash and not well versed in exiftool’s options, I just want to check my understanding: for each photo, you want to check if it’s in the specified date range, and then if it is you want to copy/move it to a directory of the format YYYYMMDD? I didn’t actually handle that logic in the script above, but I showed where you would put any arbitrary operations on each file. If you’re interested, I’d be happy to fill in the blank if you can describe your goal in a bit more detail!






  • Apparently so

    According to a site admin from that forum post (which is from April 2021–who knows where things stand now):

    If you use the OpenSubtitles website manually, you will have advertisements on the web site, NOT inside the subtitles.

    If you use some API-software to download subtitles (Plex, Kodi, BSPlayer or whatever), you are not using the web site, so you do NOT have these web advertisements. To compensate this, ads are being added on-the-fly to the subtitles itself.

    Also, from a different admin

    add few words from my side - it is good you are talking about ads. They not generating a lot of revenue, but on other side we have more VIP subscriptions because of it :) We have in ads something like “Become VIP member and Remove all ads…”

    Also, the ads in subtitles are always inserted on “empty” space. It is never in middle of movie. What Roozel wrote - “I think placing those ads at the beginning and end is somewhat OK but not in the middle or at random points in the film” - should not happen, if yes, send me the subtitle.

    If the subtitle is from tv series, there are dialogues from beginning usually. System is finding “quiet” place where ads would fit, and yes, this can be after 3 minutes of dialogue…

    This is important to know, I hope now it is more clear about subtitle ads - why we are doing this, there is possibility to remove them and how system works.

    so a scenario like in the screenshot isn’t supposed to happen. I guess if you really wanted to see if it happens you could grab all the English subs via the API and just do a quick grep or what-have-you