Skip to content

Source

Bases: BaseModel

Represent a data source, including bibliographic and web information.

Attributes:

  • title (str) –

    Title of the source.

  • authors (str) –

    Authors of the source.

  • url (Optional[str]) –

    URL of the source.

  • url_archive (Optional[str]) –

    Archived URL.

  • url_date (Optional[str]) –

    Date the URL was accessed.

  • url_date_archive (Optional[str]) –

    Date the URL was archived.

Methods:

  • __eq__

    Check for equality with another Source object.

  • __hash__

    Return a hash value for the Source instance based on all attributes.

  • __str__

    Return a string representation of the Source, including all available attributes.

  • ensure_in_wayback

    Ensure that the source URL is archived in the Wayback Machine.

  • retrieve_from_wayback

    Download a file from the Wayback Machine and save it to a specified path.

authors instance-attribute

authors: Annotated[str, Field(description='Authors of the source.')]

title instance-attribute

title: Annotated[str, Field(description='Title of the source.')]

url class-attribute instance-attribute

url: Annotated[str | None, Field(description='URL of the source.')] = None

url_archive class-attribute instance-attribute

url_archive: Annotated[str | None, Field(description='Archived URL.')] = None

url_date class-attribute instance-attribute

url_date: Annotated[str | None, Field(description='Date the URL was accessed.')] = None

url_date_archive class-attribute instance-attribute

url_date_archive: Annotated[str | None, Field(description='Date the URL was archived.')] = None

__eq__

__eq__(other: object) -> bool

Check for equality with another Source object.

Compares all attributes of the current instance with those of the other object.

Parameters:

  • other (object) –

    The object to compare with. Expected to be an instance of Source.

Returns:

  • bool

    True if all non-None attributes are equal between self and other, False otherwise. Returns False if other is not a Source instance.

__hash__

__hash__() -> int

Return a hash value for the Source instance based on all attributes.

This method computes a combined hash of the instance's attributes to uniquely identify the object in hash-based collections such as sets and dictionaries.

Returns:

  • int

    The hash value of the Source instance.

__str__

__str__() -> str

Return a string representation of the Source, including all available attributes.

Returns:

  • str

    A string detailing the source's information.

ensure_in_wayback

ensure_in_wayback() -> None

Ensure that the source URL is archived in the Wayback Machine.

This method checks if the source's url attribute is set and whether an archived URL or archive date is already present. If neither is available, it attempts to archive the URL using the Wayback Machine and updates the corresponding attributes.

Returns:

  • None

    This method updates the Source object's url_archive and url_date_archive attributes in place.

Raises:

  • ValueError

    If the url attribute is not set (None or NaN).

Examples:

>>> from technologydata import Source
>>> source = Source(url="http://example.com", title="Example Site", authors="The Authors")
>>> source.ensure_in_wayback()
A new snapshot has been stored for the url http://example.com with timestamp 2023-10-01T12:00:00Z and Archive.org url http://web.archive.org/web/20231001120000/http://example.com.
>>> source.url_archive
'http://web.archive.org/web/20231001120000/http://example.com'
>>> source.url_date_archive
'2023-10-01T12:00:00Z'

retrieve_from_wayback

retrieve_from_wayback(download_directory: Path) -> Path | None

Download a file from the Wayback Machine and save it to a specified path.

The method retrieves an archived file from the Wayback Machine using the URL from the url_archive attribute of the instance. The file is saved in the specified format based on its Content-Type field in the Response Header or the extension that can be extracted from the URL.

Parameters:

  • download_directory (Path) –

    The base path where the file will be saved.

Returns:

  • Path | None

    The specified path where the file is stored, or None if an error occurs.

Raises:

  • RequestException

    If there is an issue with the HTTP request.

Notes
  • The attribute "url_archived" should contain a valid URL.

Examples:

>>> from technologydata import Source
>>> source = Source(title="example01", authors="The Authors")
>>> output_path = source.retrieve_from_wayback(pathlib.Path("base_path"))