There are many people online promoting how useful this tool is, but I still can't find a complete tutorial
This article will teach you how to run this tool
Since web scraping is not very secure, this article does not provide a tutorial on how to solve the problem of web pages not being scraped correctly. Please solve the information retrieval problem on your own
This article is based on Windows tutorial, other systems may need to modify the steps accordingly
Let's introduce this project again#
Chief Intelligence Officer (Wiseflow) is an agile information mining tool that can extract information from various sources such as websites, WeChat official accounts, social platforms, etc. according to the set focus, automatically classify and upload the information to the database.
Environment preparation#
python (tested with 3.11.6)
ollama client
git
wiseflow project
pocketbase database
Installation#
Install python#
https://www.python.org/ftp/python/3.11.6/python-3.11.6-amd64.exe
Download and install, remember to check "add path"
Install git#
https://github.com/git-for-windows/git/releases/download/v2.46.0.windows.1/Git-2.46.0-64-bit.exe
Download and install, no need to modify, just click "Next" all the way
Install ollama#
https://ollama.com/download/OllamaSetup.exe
Download and install
Change pip source#
Then open the command line and enter the code to change the pip source to Huawei source
pip config set global.index-url https://mirrors.huaweicloud.com/repository/pypi/simple
Clone the project#
Then enter the command to clone the project
git clone https://github.com/TeamWiseFlow/wiseflow
pocketbase database#
Download for backup
Install project environment#
cd wiseflow
cd core
pip install -r requirements.txt
Configure the project#
wiseflow configuration#
Extract the downloaded pockeybase database to /wiseflow/core/pb
Go to the pb directory and execute the following command in the command line
.\pocketbase migrate up
.\pocketbase --dev admin create [set email randomly] [set password randomly]
Move the .sh scripts in the /core/scripts folder to the /core directory
Modify start_backend.sh
#!/bin/bash
set -o allexport
source ../.env
set +o allexport
exec uvicorn backend:app --reload --host localhost --port 8077
Modify start_tasks.sh
#!/bin/bash
set -o allexport
source ./env
set +o allexport
exec python tasks.py
Delete one "." in the content "source ../env"
Then right-click on the sh file, click "Open with" and change it to the program in the screenshot below
Copy the env_sample file in the /wiseflow folder and rename it to env
Then modify the content as follows
export LLM_API_KEY=" " ##there is a space here, it will cause an error if not added
export LLM_API_BASE="http://127.0.0.1:11434/v1/" ##for local model services or calling non-OpenAI services with openai_wrapper
##strongly recommended to use the following model provided by siliconflow (consider both effect and price)
export GET_INFO_MODEL="qwen2:7b"
export REWRITE_MODEL="qwen2:7b"
export HTML_PARSE_MODEL="qwen2:7b" ##or"01-ai/Yi-1.5-9B-Chat"
export PROJECT_DIR="work_dir"
export PB_API_AUTH="[set email randomly]|[set password randomly]"
# export "PB_API_BASE"="" ##only use if your pb not run on 127.0.0.1:8090
export WS_LOG="verbose" ##for detail log info. If not need, just delete this item.
Then copy the env file to the /core folder
ollama configuration#
Since the official recommendation is to use qwen2:7b, let's use this model. If there are better options, please recommend them in the comments
Then enter the command
ollama pull qwen2:7b
Run the project#
Double-click to start the start_backend.sh and start_pb.sh files in the /core folder
Enter http://127.0.0.1:8090/_/ in the browser
Then enter the email and password set above
Add sites and tags
Don't forget to activate
Then run start_tasks.sh
You can see the crawled article content displayed in the command line, and you can also see it in the articles section
Afterword#
Currently, the project does not support rsshub. Please solve the problem of crawling certain websites on your own
Due to the author's use of an AMD 7600MXT graphics card, the graphics memory is overloaded, so I don't know how it performs, but I can confirm that this configuration can run
According to the official statement of the project
SiliconFlow officially announced that several LLM online reasoning services such as Qwen2-7B-Instruct and glm-4-9b-chat are now free, which means you can use the Chief Intelligence Officer for information mining at "zero cost"!
As of the time of publication of this article, it has changed to a paid service. Considering the amount of data retrieved by the program, using the paid API will be costly.