Source
¶
Bases: BaseModel
Represent a data source, including bibliographic and web information.
Attributes:
-
title(str) –Title of the source.
-
authors(str) –Authors of the source.
-
url(Optional[str]) –URL of the source.
-
url_archive(Optional[str]) –Archived URL.
-
url_date(Optional[str]) –Date the URL was accessed.
-
url_date_archive(Optional[str]) –Date the URL was archived.
Methods:
-
__eq__–Check for equality with another Source object.
-
__hash__–Return a hash value for the Source instance based on all attributes.
-
__str__–Return a string representation of the Source, including all available attributes.
-
ensure_in_wayback–Ensure that the source URL is archived in the Wayback Machine.
-
retrieve_from_wayback–Download a file from the Wayback Machine and save it to a specified path.
url
class-attribute
instance-attribute
¶
url: Annotated[str | None, Field(description='URL of the source.')] = None
url_archive
class-attribute
instance-attribute
¶
url_archive: Annotated[str | None, Field(description='Archived URL.')] = None
url_date
class-attribute
instance-attribute
¶
url_date: Annotated[str | None, Field(description='Date the URL was accessed.')] = None
url_date_archive
class-attribute
instance-attribute
¶
url_date_archive: Annotated[str | None, Field(description='Date the URL was archived.')] = None
__eq__
¶
__eq__(other: object) -> bool
Check for equality with another Source object.
Compares all attributes of the current instance with those of the other object.
Parameters:
-
other(object) –The object to compare with. Expected to be an instance of Source.
Returns:
-
bool–True if all non-None attributes are equal between self and other, False otherwise. Returns False if other is not a Source instance.
__hash__
¶
__hash__() -> int
Return a hash value for the Source instance based on all attributes.
This method computes a combined hash of the instance's attributes to uniquely identify the object in hash-based collections such as sets and dictionaries.
Returns:
-
int–The hash value of the Source instance.
__str__
¶
__str__() -> str
Return a string representation of the Source, including all available attributes.
Returns:
-
str–A string detailing the source's information.
ensure_in_wayback
¶
ensure_in_wayback() -> None
Ensure that the source URL is archived in the Wayback Machine.
This method checks if the source's url attribute is set and whether
an archived URL or archive date is already present. If neither is available, it attempts to archive the
URL using the Wayback Machine and updates the corresponding attributes.
Returns:
-
None–This method updates the Source object's
url_archiveandurl_date_archiveattributes in place.
Raises:
-
ValueError–If the
urlattribute is not set (None or NaN).
Examples:
>>> from technologydata import Source
>>> source = Source(url="http://example.com", title="Example Site", authors="The Authors")
>>> source.ensure_in_wayback()
A new snapshot has been stored for the url http://example.com with timestamp 2023-10-01T12:00:00Z and Archive.org url http://web.archive.org/web/20231001120000/http://example.com.
>>> source.url_archive
'http://web.archive.org/web/20231001120000/http://example.com'
>>> source.url_date_archive
'2023-10-01T12:00:00Z'
retrieve_from_wayback
¶
retrieve_from_wayback(download_directory: Path) -> Path | None
Download a file from the Wayback Machine and save it to a specified path.
The method retrieves an archived file from the Wayback Machine using the URL from the url_archive attribute of the instance. The file is saved in the specified format based on its Content-Type field in the Response Header or the extension that can be extracted from the URL.
Parameters:
-
download_directory(Path) –The base path where the file will be saved.
Returns:
-
Path | None–The specified path where the file is stored, or None if an error occurs.
Raises:
-
RequestException–If there is an issue with the HTTP request.
Notes
- The attribute "url_archived" should contain a valid URL.
Examples:
>>> from technologydata import Source
>>> source = Source(title="example01", authors="The Authors")
>>> output_path = source.retrieve_from_wayback(pathlib.Path("base_path"))