The World’s Largest Online Community for Developers
I'm attempting to scrape a site's list of news articles, capturing topic, headline, author, and date published into Google Sheets using IMPORTXML. I've got the first two, but the last two are assembled somewhat confusingly.
The website has a page where all of its stories are listed chronologically. In the source of that page, the author and date published are rendered thus within a div:
By <span class="post-item-river__byline___mU1tP author vcard"><a class="byline-link url fn n" href="https://www.fakeurlgoeshere.com">Author Name</a></span><time class="post-item-river__date___1Dcq1 entry-date published" datetime="20XX-XX-XXTXX:XX:XX-XX:XX">Date Published</time>
How this displays on site: Author Name·Date Published
How this displays when scraped in IMPORTXML: Author NameDate Published
I would like Author Name and Time Published to be recognized as separate fields. How do I accomplish this?
I have attempted multiple arguments, including trying numerous variants of div/time arguments, but those don't seem to have worked, with the output always returning 'Imported content is empty'.