auto_research.search.core module

class AutoSearch(keywords, num_results=30, delay=1, sort_by='relevance', date_cutoff='2024-01-01', score_threshold=0.5, recency_weight=3.5, auto_destination=False, destination_folder='search_results', zip_folder=True)[source]

Bases: object

A class to search for academic papers by keyword(s), retrieve their details, and optionally download them.

Parameters:
keywords

The keyword(s) to search for. If a string, performs a single search. If a list, performs multiple searches.

Type:

str | list[str]

num_results

The number of results to retrieve.

Type:

int

delay

Added delay (in seconds) between requests to avoid rate limiting.

Type:

int

sort_by

Sorting criteria for the Google Scholar search engine (“date” or

Type:

str

"relevance").
date_cutoff

The cutoff date for papers when sorting by date (format: “YYYY-MM-DD”).

Type:

str

score_threshold

The minimum combined score for papers to be displayed/downloaded. The combined score is calculated differently based on the sorting criteria:

  • If sorting by “date”:

    \[\text{combined_score} = \frac{\text{citation_count}}{\left(\frac{365 + \text{days_ago}}{365}\right)^{\text{recency_weight}}}\]

    where \(\text{days_ago}\) is the number of days since the paper was published.

  • If sorting by “relevance”:

    \[\text{combined_score} = \frac{\text{citation_count}}{\text{recency}^ {\text{recency_weight}}}\]

    where \(\text{recency}\) is the number of years since the paper was published.

The \(\text{recency_weight}\) parameter controls how much weight is given to

the recency of the paper.

Type:

float

recency_weight

The weight given to recency when calculating the combined score.

Type:

float

auto_destination

Whether to automatically generate the destination folder name.

Type:

bool

destination_folder

The folder where downloaded papers will be saved.

Type:

str

zip_folder

Whether to zip the downloaded papers.

Type:

bool

Example

>>> search = AutoSearch("machine learning", num_results=10)
>>> search.run()
__init__(keywords, num_results=30, delay=1, sort_by='relevance', date_cutoff='2024-01-01', score_threshold=0.5, recency_weight=3.5, auto_destination=False, destination_folder='search_results', zip_folder=True)[source]

Initialize the AutoSearch class with the given parameters.

Parameters:
  • keywords (str | list[str]) – The keyword(s) to search for.

  • num_results (int) – The number of results to retrieve.

  • delay (int) – Delay between requests.

  • sort_by (str) – Sorting criteria (“date” or “relevance”).

  • date_cutoff (str) – Cutoff date for date-based search (format: “YYYY-MM-DD”).

  • score_threshold (float) – Minimum combined score for papers.

  • recency_weight (float) – Weight for recency in combined score calculation.

  • auto_destination (bool) – Whether to auto-generate the destination folder name.

  • destination_folder (str) – Folder to save downloaded papers.

  • zip_folder (bool) – Whether to zip the downloaded papers.

Return type:

None

search_papers_by_keyword(keyword)[source]

Search for papers by a given keyword and retrieve their details.

Parameters:

keyword (str) – The keyword to search for.

Returns:

A list of dictionaries containing paper details.

Return type:

list[dict[str, Any]]

Example

>>> search = AutoSearch("machine learning")
>>> papers = search.search_papers_by_keyword("machine learning")
>>> len(papers) > 0
True
display_and_download(papers_info, verbose=True)[source]

Display the details of the papers and optionally download them.

Parameters:
  • papers_info (list[dict[str, Any]]) – A list of dictionaries containing paper details.

  • verbose (bool) – Whether to display detailed information.

Return type:

None

Example

>>> search = AutoSearch("machine learning")
>>> papers = search.search_papers_by_keyword("machine learning")
>>> search.display_and_download(papers)

Perform a single search for a given keyword and process the results.

Parameters:

keyword (str) – The keyword to search for.

Return type:

None

Example

>>> search = AutoSearch("machine learning")
>>> search.perform_a_search("machine learning")
run()[source]

Execute the search based on the initialized parameters.

Example

>>> search = AutoSearch("machine learning")
>>> search.run()
Return type:

None