Skip to content

Genserver time out crash in long-running pipeline #230

@starcraft66

Description

@starcraft66

I wrote a crawler that crawls media galleries to download them and download files uing HTTPoison in a custom pipeline. This seems to work fine for small files as they download quickly but if I try to crawl a really big file like a video, I will almost always get this crash:

12:49:51.241 [error] GenServer #PID<0.471.0> terminating
** (stop) exited in: GenServer.call(Crawly.DataStorage, {:store, KemonoCrawler, %{...}, 5000)
    ** (EXIT) exited in: GenServer.call(#PID<0.468.0>, :stats, 5000)
        ** (EXIT) time out
    (elixir 1.14.0) lib/gen_server.ex:1038: GenServer.call/3
    (elixir 1.14.0) lib/enum.ex:975: Enum."-each/2-lists^foreach/1-0-"/2
    (crawly 0.14.0) lib/crawly/worker.ex:173: Crawly.Worker.process_parsed_item/1
    (crawly 0.14.0) lib/crawly/worker.ex:50: Crawly.Worker.handle_info/2
    (stdlib 4.1) gen_server.erl:1123: :gen_server.try_dispatch/4
    (stdlib 4.1) gen_server.erl:1200: :gen_server.handle_msg/6
    (stdlib 4.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message: :work
State: %Crawly.Worker{backoff: 10000, spider_name: KemonoCrawler, crawl_id: "e8e7bb02-465f-11ed-85df-3a9adacac61c"}

Is there a setting I can tweak to allow my pipeline to run for much longer? I didn't see anything easily tweakable in the config.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions