- Create virtual environment for the project using Python 3.8+
- Install requirements with
pip install -r requirements.txt - Update search URLs in the file
./imot_bg_crawler/input.yamlWhen done, check withhttp://www.yamllint.com/if the input file is okay. - Run spider for the desired website. If you do not want logs add
--nologin the end of the command - When finished, check the
./imot_bg_crawler/output_filesfolder for the results. - Enjoy.
- Imot.bg -
scrapy crawl imot.bg - Imoti.com -
scrapy crawl imoti.com
SKIP_EXISTING - does not save data if already saved, default True
PER_ITEM_RESULT - saves every item in a separate folder, default True
PER_ITEM_DOWNLOAD_IMAGES - if PER_ITEM_RESULT is enabled, marks if crawler will download item images, default True