• 0 Posts
  • 23 Comments
Joined 1 year ago
cake
Cake day: August 24th, 2023

help-circle
  • Python

    Execution time: ~<1 millisecond (800 microseconds on my machine)

    Good old school linear algebra from middle school. we can solve this really really fast. With minimal changes from part 1!

    FastCode

    [ paste ]

    from time import perf_counter_ns
    import string
    
    def profiler(method):
        def wrapper_method(*args: any, **kwargs: any) -> any:
            start_time = perf_counter_ns()
            ret = method(*args, **kwargs)
            stop_time = perf_counter_ns() - start_time
            time_len = min(9, ((len(str(stop_time))-1)//3)*3)
            time_conversion = {9: 'seconds', 6: 'milliseconds', 3: 'microseconds', 0: 'nanoseconds'}
            print(f"Method {method.__name__} took : {stop_time / (10**time_len)} {time_conversion[time_len]}")
            return ret
    
        return wrapper_method
    
    @profiler
    def main(input_data):
        part1_total_cost = 0
        part2_total_cost = 0
        for machine in input_data:
            Ax,Ay,Bx,By,Px,Py = [ int(l[2:]) for l in machine.split() if l[-1] in string.digits ]
            y,r = divmod((Ay * Px - Ax * Py), (Ay * Bx - Ax * By))
            if r == 0:
                x = (Px - Bx * y) // Ax
                part1_total_cost += 3*x + y
            y,r = divmod((Ay * (Px+10000000000000) - Ax * (Py+10000000000000)), (Ay * Bx - Ax * By))
            if r == 0:
                x = ((Px+10000000000000) - Bx * y) // Ax
                part2_total_cost += 3*x + y
    
        return part1_total_cost,part2_total_cost
    
    if __name__ == "__main__":
        with open('input', 'r') as f:
            input_data = f.read().strip().replace(',', '').split('\n\n')
        part_one, part_two = main(input_data)
        print(f"Part 1: {part_one}\nPart 2: {part_two}")
    
    


  • Only concept/idea that offers this benefit rn is just having a healthy competitive market that sees companies not form into a monolopy or structured in a way that allows for each one to set prices to higher amounts because everyone else is doing it. looking at how Intel is struggling rn to stay relevant and they are entering the GPU market with some very competitive mid range options that outperform Nvidia and AMD on common consumer tasks, while still offering their own blend of AI cores for new tasks that we will likely start having in the coming future. All this and Intel is offering these mid-range cards at 60% of the cost to the closest competition. While I enjoy seeing this, I am annoyed this is the only way we can see prices lower. We really need to come together as a community and push prices to more reasonable levels.

    Inflation is insane right now and it is mainly what is annoying me the most about people thinking UBI would even help prevent it from being irrelevant once companies realize they can just charge more for the same identical/marginally improved product. I see the floor for prices steadily increase and UBI will just cause it to rise up to the point that the UBI benefit is worthless as the purchasing power is decreased to the point of a single dollar being worth less than what a penny used to be.


  • I agree, and I think the best service we have but is being overshadowed by Amazon is the US Post service. It really needs a push to modernize.

    I also think instead of UBI, anything that is a basic need will be taxed based on a progressive schedule instead of a flat percentage. That way if they try to make it more expensive then it will be taxed too much to be viable. We need to combat this inflation and make it so that a lower priced item is more profitable!


  • Why are you under the impression that UBS will not pay for those services?

    The US Post service is the biggest UBS that most Americans pay with taxes. Those who can’t afford or can’t make money to pay taxes or otherwise still benefit from it as “free”

    You seem to think it doesn’t exist or will not work. Yet it does. Libraries exist, public transportation exists. People needs can be met.


  • only if that income is required for basic necessities and everyone will need in their lives. for a generalization, there are three things I can think of the top of my head that everyone needs. It is to have housing, healthcare, and food. There are many more basic needs people should have fulfilled but I digress.

    Currently in many first world and third world countries/classes are reliant on funding to fulfill most if not all basic needs. That is when it should be mandatory for UBI. How is something like that funded? like everything else. we all pay for it. Call it taxes, call it charity, call it whatever you want.

    Yet, Why would someone need UBI for basic needs? well mostly because the general public is more divided and distrustful of centralized sources/authorities. Yet the only way UBI would be able to occur is with that kind of system.

    So in all I don’t think UBI would be supported by me. I like federated services and decentralization. I don’t like the current state of all basic needs being behind paywalls. It is disappointing. I don’t know what would help us the most, but moving into this direction is just not what I can see would be kind to people who are low on economic scales or helpful for most who are barely scraping by. Even if I live more or less comfortably right now, I see many basic needs in my life that I would still want to improve substantially or become available for me to act on.


  • you are right, I dont think loading the file from disk should be part of it because OS process priority/queue, disk and cache and more headaches on figuring out what is slow. If you want to compare the entire execution including python startup overhead and reading from file and anything extra. it is closer 50 to 60 ms on linux and 80-90 ms on windows. (both hosts, not virtual machines)

    My reasoning is that loading the input will eventually either pull from the website or disk. that is not part of the challenge. you could simply just hard code it.

    So maybe you should look into adding code to your debug mode or both modes for measuring solving it instead of the entire process loading.

    however, someone claims their rust code can do 250 microseconds, so I doubt you have much excuse aside from having “inefficient” code.(you are still fast, just not at the limit of your Language’s performance) only measuring my python algorithm, it is only able to finish in 32000 microseconds.

    https://github.com/maneatingape/advent-of-code-rust/blob/main/src/year2024/day11.rs

    however, now that I am looking at their main.rs file, they do calculate time for completion after process startup and only the algorithm.


  • Hey that is what release mode is for! optimizing by unrolling and other tricks is needed, but in your case, I think I remember someone mention their rust release mode is closer to 2 ms.

    I did try something like PyPy3 but it seems to be slower by ~3 milliseconds! So I don’t know where I could improve my code any further without unrolling the range(75) loop. though would only make it closer to ~29 ms on avg.

    Edit: apparently they updated their code to be closer to 250 micro seconds. that is blazing fast [ link ]


  • Learning to profile code gives you a chance to learn what is inefficient code! I definitely like to spend sometime looking at it but at the end of the day. I really need to learn more languages. for now, I am sticking with trusty python. image bellow is in microseconds.

    screenshots

    if the python process was kept alive, then we only are saving 25 milliseconds from ~250 to ~235! However, in real world testing, it seems that the profiler is not really a proper enough test! likely because the profiler is adding some overhead to each line of code.

    notice here, if I add this line of code:

    transform_cache = {}
    BASE_DIR = dirname(realpath(__file__))
    if isfile(join(BASE_DIR, r'transform_cache')):
        with open('transform_cache', 'r') as f:
            transform_cache = literal_eval(f.read().replace('\r','').replace('\n',''))
    transform_cache = {} # this makes the code faster???
    

    Notice I load from file a transform_cache from a previous run. However because of the “if stone in transform_cache” check, the loaded transform_cache is for some reason slower than allowing it to be filled up again. however, loading it and clearing it, is faster because the cpu/ram/OS is probably doing their own caching, too. if we remove the “if stone in transform_cache” check and keep the transform_cache fully loaded, then it is faster by ~1 millisecond, down to 29 milliseconds! these are the niche problems with caching things and squeezing all the performance out of code.


  • Python

    Part 1: ~2 milliseconds, Part 2: ~32 milliseconds, Total Time: ~32 milliseconds
    You end up doing part 1 at the same time as part 2 but because of how Advent of Code works, you need to rerun the code after part 1 is solved. so Part 2 is technically total time.

    Fast Code
    from time import perf_counter_ns
    
    transform_cache = {'0': [ '1']}
    
    def transform(current_stone):
        if len(current_stone) % 2 == 0:
            mid = len(current_stone) // 2
            res = [current_stone[:mid], current_stone[mid:].lstrip('0').rjust(1,'0')]
        else:
            res = [str(int(current_stone) * 2024)]
        transform_cache[current_stone] = res
        return res
    
    def main(initial_stones):
        stones_count = {}
        for stone in initial_stones.split():
            stones_count[stone] = stones_count.get(stone, 0) + 1
        part1 = 0
        for i in range(75):
            new_stones_count = {}
            for stone, count in stones_count.items():
                for r in (transform_cache.get(stone) if stone in transform_cache else transform(stone)):
                    new_stones_count[r] = new_stones_count.get(r, 0) + count
            stones_count = new_stones_count
            if i == 24:
                part1 = sum(stones_count.values())
        return part1,sum(stones_count.values())
    
    if __name__ == "__main__":
        with open('input', 'r') as f:
            input_data = f.read().replace('\r', '').replace('\n', '')
        start_time, part_one, part_two = perf_counter_ns(),*main(input_data)
        stop_time = perf_counter_ns() - start_time
        time_len = min(9, ((len(str(stop_time))-1)//3)*3)
        time_conversion = {9: 'seconds', 6: 'milliseconds', 3: 'microseconds', 0: 'nanoseconds'}
        print(f"Part 1: {part_one}\nPart 2: {part_two}\nProcessing Time: {stop_time / (10**time_len)} {time_conversion[time_len]}")
    
    




  • idk why my gpt decided to be like that. I expected a long winded response with a little bit of ai hallucinations. I was flabbergasted, and just had to post it.

    I seemingly realized that working forward through the list of integers was inefficient for me to do, and I was using multiprocessing to do it too! which my old solution took less than 15 seconds for my input. your solution to reverse the operations and eliminate paths was brilliant and made it solve it in milliseconds without spinning up my fans, lol




  • so if I look at each part of my code. the first 4 lines will take 20 ms

    input_data = input_data.replace('\r', '').replace('\n', '')
    part2_data = [[i//2 for _ in range(int(x))] if i%2==0 else ['.' for _ in range(int(x))] for i,x in enumerate(input_data)]
    part2_data = [ x for x in part2_data if x!= [] ]
    part1_data = [y for x in part2_data for y in x]
    

    The part1 for loop will take 10 ms.

    The for loop to set up next_empty_slot_by_length will take another 10 ms.

    The part2 for loop will take 10 ms, too!

    and adding up the part2 checksums will add another 10 ms.

    So, in total, it will do it in ~60 ms, but python startup overhead seems to add 20-40 ms depending if you are on Linux(20 ms) or Windows(40 ms). both are Host, not virtual. Linux usually has faster startup time.

    I am not sure where I would see a speed up. It seems that the startup overhead makes this just slower than the other top performing solutions which are also hitting a limit of 40-60 ms.


  • ah well, I tried switching to python’s set() but it was slow because of the fact it is unordered. I would need to use a min() to find the minimum index number, which was slow af. indexing might be fast but pop(0) on a list is also just as fast.(switching to deque had no speed up either) The list operations I am using are mostly O(1) time

    If I comment out this which does the adding:

    # adds checksums
        part2_data = [y for x in part2_data for y in x]
        part2 = 0
        for i,x in enumerate(part2_data):
            if x != '.':
                part2 += i*x
    

    so that it isolates the checksum part. it is still only 80-100ms. so the checksum part had no noticeable slowdown, even if I were to do the check sum at the same time I do the sorting it would not lower execution time.


  • yeah, I was a bit exhausted thinking in a high level abstract way. I do think that if I do the checksum at the same time I could shave off a few more milliseconds. though it is at like the limits of speed, especially for python with limited data types(no trees lol). Decently fast enough for me :)

    edit: I also just tested it and splitting into two lists gave no decent speed up and made it slower. really iterating backwards is fast with that list slice. I can’t think of another way to speed it up past it can do rn


  • Thanks! your Haskell solution is extremely fast and I don’t understand your solution, too. 🙈 lol

    My latest revision just keeps a dict with lists of known empty slots with the length being the dict key, including partially filled slots. I iteratively find the slot that has the lowest index number and make sure the lists are properly ordered from lowest to highest index number.

    looking at the challenge example/description, it shows a first pass only type of “fragmenting”. we can be confident that if something did not fit, it can just stay in the same spot even if another slot frees up enough space for it to fit. so just checking if current index is lower than the lowest index number of any of the slot lengths would just be enough to stop early. That is why I got rid of last_consecutive_full_partition because it was slowing it down by up to 2 seconds.

    in example, even if 5555, 6666, or 8888 can fit in the new spot created by moving 44, they are staying put. Thus a first pass only sort from back to front.

    00...111...2...333.44.5555.6666.777.888899
    0099.111...2...333.44.5555.6666.777.8888..
    0099.1117772...333.44.5555.6666.....8888..
    0099.111777244.333....5555.6666.....8888..
    00992111777.44.333....5555.6666.....8888..