{ "cells": [ { "cell_type": "markdown", "id": "34574712-2e7a-47ee-9f7a-89e221dbd12e", "metadata": {}, "source": [ "# 项目扩展:根据 Hacker News 生成技术趋势报告\n", "\n", "GitHubSentinel 项目并不仅针对与单一信息源设计。\n", "\n", "相反,其模块设计为扩展信息源和生成不同类型报告留下了充足的空间。\n", "\n", "## HackerNewsClient 设计与实现\n", "\n", "本教程代码使用 ChatGPT(GPT-4o)生成,并展示了多轮对话结果的代码优化过程。\n", "\n", "完整对话链接:https://chatgpt.com/share/a1cb2dca-06a6-4258-9038-c7271e22cf77" ] }, { "cell_type": "markdown", "id": "72955239-fa22-4bac-bb3f-0a01948c75c0", "metadata": {}, "source": [ "\n", "\n", "## Prompt\n", "\n", "生成一段爬取 HackerNews 最新热点的 Python 代码\n", "\n", "## ChatGPT Response\n", "\n", "以下是一段使用 Python 爬取 HackerNews 最新热点的代码。此代码使用 `requests` 库进行 HTTP 请求,并使用 `BeautifulSoup` 解析 HTML 来提取热点新闻的标题和链接。\n", "\n", "```python\n", "import requests\n", "from bs4 import BeautifulSoup\n", "\n", "def fetch_hackernews_top_stories():\n", " url = 'https://news.ycombinator.com/'\n", " response = requests.get(url)\n", " response.raise_for_status() # 检查请求是否成功\n", "\n", " soup = BeautifulSoup(response.text, 'html.parser')\n", " stories = soup.find_all('tr', class_='athing')\n", "\n", " top_stories = []\n", " for story in stories:\n", " title_tag = story.find('a', class_='storylink')\n", " title = title_tag.text\n", " link = title_tag['href']\n", " top_stories.append({'title': title, 'link': link})\n", "\n", " return top_stories\n", "\n", "if __name__ == \"__main__\":\n", " stories = fetch_hackernews_top_stories()\n", " for idx, story in enumerate(stories, start=1):\n", " print(f\"{idx}. {story['title']}\")\n", " print(f\" Link: {story['link']}\")\n", "```\n", "\n", "### 代码说明:\n", "1. **`fetch_hackernews_top_stories` 函数**:发送 HTTP 请求获取 HackerNews 的主页,并使用 `BeautifulSoup` 解析 HTML 内容。\n", "2. **`soup.find_all` 方法**:用于查找所有包含新闻的 `tr` 标签,其中类名为 `athing`。\n", "3. **提取标题和链接**:从每个新闻项目中提取标题 (`storylink` 类) 和对应的链接,并将这些信息存储在一个字典中。\n", "4. **返回值**:返回一个包含新闻标题和链接的字典列表。\n", "\n", "### 运行代码:\n", "运行此代码后,您将获得 HackerNews 最新热点新闻的标题和链接列表,并在控制台输出。\n", "\n", "确保安装了 `requests` 和 `beautifulsoup4` 库:\n", "```bash\n", "pip install requests beautifulsoup4\n", "```\n", "\n", "这段代码简单有效地抓取了 HackerNews 网站的热点新闻信息。" ] }, { "cell_type": "code", "execution_count": 1, "id": "fdcc843e-a841-4850-8fb5-3ea37956606c", "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "'NoneType' object has no attribute 'text'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "Cell \u001b[0;32mIn[1], line 22\u001b[0m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m top_stories\n\u001b[1;32m 21\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;18m__name__\u001b[39m \u001b[38;5;241m==\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m__main__\u001b[39m\u001b[38;5;124m\"\u001b[39m:\n\u001b[0;32m---> 22\u001b[0m stories \u001b[38;5;241m=\u001b[39m \u001b[43mfetch_hackernews_top_stories\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 23\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m idx, story \u001b[38;5;129;01min\u001b[39;00m \u001b[38;5;28menumerate\u001b[39m(stories, start\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m1\u001b[39m):\n\u001b[1;32m 24\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;132;01m{\u001b[39;00midx\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m. \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mstory[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mtitle\u001b[39m\u001b[38;5;124m'\u001b[39m]\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m\"\u001b[39m)\n", "Cell \u001b[0;32mIn[1], line 15\u001b[0m, in \u001b[0;36mfetch_hackernews_top_stories\u001b[0;34m()\u001b[0m\n\u001b[1;32m 13\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m story \u001b[38;5;129;01min\u001b[39;00m stories:\n\u001b[1;32m 14\u001b[0m title_tag \u001b[38;5;241m=\u001b[39m story\u001b[38;5;241m.\u001b[39mfind(\u001b[38;5;124m'\u001b[39m\u001b[38;5;124ma\u001b[39m\u001b[38;5;124m'\u001b[39m, class_\u001b[38;5;241m=\u001b[39m\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mstorylink\u001b[39m\u001b[38;5;124m'\u001b[39m)\n\u001b[0;32m---> 15\u001b[0m title \u001b[38;5;241m=\u001b[39m \u001b[43mtitle_tag\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtext\u001b[49m\n\u001b[1;32m 16\u001b[0m link \u001b[38;5;241m=\u001b[39m title_tag[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mhref\u001b[39m\u001b[38;5;124m'\u001b[39m]\n\u001b[1;32m 17\u001b[0m top_stories\u001b[38;5;241m.\u001b[39mappend({\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mtitle\u001b[39m\u001b[38;5;124m'\u001b[39m: title, \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mlink\u001b[39m\u001b[38;5;124m'\u001b[39m: link})\n", "\u001b[0;31mAttributeError\u001b[0m: 'NoneType' object has no attribute 'text'" ] } ], "source": [ "import requests\n", "from bs4 import BeautifulSoup\n", "\n", "def fetch_hackernews_top_stories():\n", " url = 'https://news.ycombinator.com/'\n", " response = requests.get(url)\n", " response.raise_for_status() # 检查请求是否成功\n", "\n", " soup = BeautifulSoup(response.text, 'html.parser')\n", " stories = soup.find_all('tr', class_='athing')\n", "\n", " top_stories = []\n", " for story in stories:\n", " title_tag = story.find('a', class_='storylink')\n", " title = title_tag.text\n", " link = title_tag['href']\n", " top_stories.append({'title': title, 'link': link})\n", "\n", " return top_stories\n", "\n", "if __name__ == \"__main__\":\n", " stories = fetch_hackernews_top_stories()\n", " for idx, story in enumerate(stories, start=1):\n", " print(f\"{idx}. {story['title']}\")\n", " print(f\" Link: {story['link']}\")\n" ] }, { "cell_type": "markdown", "id": "c277e45a-1ea6-4641-be89-0c4a3d61f772", "metadata": {}, "source": [ "## Prompt\n", "\n", "报错如下:\n", "\n", "---------------------------------------------------------------------------\n", "AttributeError Traceback (most recent call last)\n", "Cell In[1], line 22\n", " 19 return top_stories\n", " 21 if __name__ == \"__main__\":\n", "---> 22 stories = fetch_hackernews_top_stories()\n", " 23 for idx, story in enumerate(stories, start=1):\n", " 24 print(f\"{idx}. {story['title']}\")\n", "\n", "Cell In[1], line 15, in fetch_hackernews_top_stories()\n", " 13 for story in stories:\n", " 14 title_tag = story.find('a', class_='storylink')\n", "---> 15 title = title_tag.text\n", " 16 link = title_tag['href']\n", " 17 top_stories.append({'title': title, 'link': link})\n", "\n", "AttributeError: 'NoneType' object has no attribute 'text'\n", "\n", "-------------------------------------------------\n", "给你一些参考信息和建议:\n", "- 使用联网功能查看Hacker News(https://news.ycombinator.com/)网站,并分析其前端代码,获取数据结构。\n", "- 如果无法获取其前端代码,可以查看附件: Hacker News html 离线文件\n", "\n", "## ChatGPT Response\n", "\n", "从上传的 Hacker News HTML 文件来看,HTML 结构与初始代码中的假设略有不同。特别是新闻标题链接的类名是 `titleline` 而不是 `storylink`,这导致之前的代码无法正确提取标题。因此,以下是修正后的代码示例:\n", "\n", "```python\n", "import requests\n", "from bs4 import BeautifulSoup\n", "\n", "def fetch_hackernews_top_stories():\n", " url = 'https://news.ycombinator.com/'\n", " response = requests.get(url)\n", " response.raise_for_status() # 检查请求是否成功\n", "\n", " soup = BeautifulSoup(response.text, 'html.parser')\n", " stories = soup.find_all('tr', class_='athing')\n", "\n", " top_stories = []\n", " for story in stories:\n", " title_tag = story.find('a', class_='titleline')\n", " if title_tag:\n", " title = title_tag.text\n", " link = title_tag['href']\n", " top_stories.append({'title': title, 'link': link})\n", "\n", " return top_stories\n", "\n", "if __name__ == \"__main__\":\n", " stories = fetch_hackernews_top_stories()\n", " for idx, story in enumerate(stories, start=1):\n", " print(f\"{idx}. {story['title']}\")\n", " print(f\" Link: {story['link']}\")\n", "```\n", "\n", "### 主要修改点:\n", "- 将 `story.find('a', class_='storylink')` 修改为 `story.find('a', class_='titleline')` 来匹配正确的标题链接类。\n", "- 添加了对 `title_tag` 的检查,确保它存在后再访问 `.text` 和 `.href` 属性,以防止 `NoneType` 错误。\n", "\n", "### 运行结果:\n", "此代码将正确输出 Hacker News 最新热点的标题和链接。" ] }, { "cell_type": "code", "execution_count": 2, "id": "642cc1ee-5764-484a-a3ea-299611a30e78", "metadata": {}, "outputs": [], "source": [ "import requests\n", "from bs4 import BeautifulSoup\n", "\n", "def fetch_hackernews_top_stories():\n", " url = 'https://news.ycombinator.com/'\n", " response = requests.get(url)\n", " response.raise_for_status() # 检查请求是否成功\n", "\n", " soup = BeautifulSoup(response.text, 'html.parser')\n", " stories = soup.find_all('tr', class_='athing')\n", "\n", " top_stories = []\n", " for story in stories:\n", " title_tag = story.find('a', class_='titleline')\n", " if title_tag:\n", " title = title_tag.text\n", " link = title_tag['href']\n", " top_stories.append({'title': title, 'link': link})\n", "\n", " return top_stories\n", "\n", "if __name__ == \"__main__\":\n", " stories = fetch_hackernews_top_stories()\n", " for idx, story in enumerate(stories, start=1):\n", " print(f\"{idx}. {story['title']}\")\n", " print(f\" Link: {story['link']}\")\n" ] }, { "cell_type": "markdown", "id": "2b12cb3f-d973-43f3-81f7-3c1d42af0372", "metadata": {}, "source": [ "## Prompt\n", "\n", "执行后,没有任何输出\n", "\n", "## ChatGPT Response\n", "\n", "如果执行后没有任何输出,可能是因为 HTML 结构中 `tr` 标签的类名或其他元素的结构发生了变化。我们可以通过更仔细地检查 HTML 结构来调整代码。根据你提供的文件,应该改用以下方法来提取标题和链接。\n", "\n", "这里是一个更新后的版本,它确保正确提取所有新闻标题和链接:\n", "\n", "```python\n", "import requests\n", "from bs4 import BeautifulSoup\n", "\n", "def fetch_hackernews_top_stories():\n", " url = 'https://news.ycombinator.com/'\n", " response = requests.get(url)\n", " response.raise_for_status() # 检查请求是否成功\n", "\n", " soup = BeautifulSoup(response.text, 'html.parser')\n", " # 查找包含新闻的所有 标签\n", " stories = soup.find_all('tr', class_='athing')\n", "\n", " top_stories = []\n", " for story in stories:\n", " title_tag = story.find('span', class_='titleline').find('a')\n", " if title_tag:\n", " title = title_tag.text\n", " link = title_tag['href']\n", " top_stories.append({'title': title, 'link': link})\n", "\n", " return top_stories\n", "\n", "if __name__ == \"__main__\":\n", " stories = fetch_hackernews_top_stories()\n", " if stories:\n", " for idx, story in enumerate(stories, start=1):\n", " print(f\"{idx}. {story['title']}\")\n", " print(f\" Link: {story['link']}\")\n", " else:\n", " print(\"No stories found.\")\n", "```\n", "\n", "### 主要修改点:\n", "1. **查找标题和链接**:通过查找 `` 标签类名为 `titleline`,然后查找 `` 标签来获取新闻的标题和链接。\n", "2. **输出检查**:增加了一个检查,如果未找到任何新闻,则输出 \"No stories found.\"\n", "\n", "### 检查输出:\n", "运行此代码后,如果 HTML 结构匹配,应该可以看到新闻标题和链接。如果仍然没有输出,请确保代码执行环境可以访问 Hacker News 网站,或者确保解析的 HTML 结构与实际的网站结构一致。" ] }, { "cell_type": "code", "execution_count": 4, "id": "b63e7f38-1a0e-4271-85cc-29c17584c78d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1. Ask HN: What are you working on (August 2024)?\n", " Link: item?id=41342017\n", "2. Defenders think in lists, attackers think in graphs (2015)\n", " Link: https://github.com/JohnLaTwC/Shared/blob/master/Defenders%20think%20in%20lists.%20Attackers%20think%20in%20graphs.%20As%20long%20as%20this%20is%20true%2C%20attackers%20win.md\n", "3. Neurotechnology Numbers Worth Knowing\n", " Link: https://milan.cvitkovic.net/writing/neurotechnology_numbers_worth_knowing/\n", "4. Papersway – a scrollable window management for Sway/i3wm\n", " Link: https://spwhitton.name/tech/code/papersway/\n", "5. Transfer energy from nitrogen to argon enables 2-wayl cascaded lasing in air\n", " Link: https://phys.org/news/2024-08-mechanism-energy-nitrogen-argon-enables.html\n", "6. AMD's Radeon 890M: Strix Point's Bigger iGPU\n", " Link: https://chipsandcheese.com/2024/08/24/amds-radeon-890m-strix-points-bigger-igpu/\n", "7. Serpent OS Prealpha0 Released\n", " Link: https://serpentos.com/blog/2024/08/01/serpent-os-prealpha0-released/\n", "8. Consistently Making Wrong Decisions Whilst Writing Recreational C\n", " Link: https://amodernist.com/texts/fun-c.html\n", "9. You are not dumb, you just lack the prerequisites\n", " Link: https://lelouch.dev/blog/you-are-probably-not-dumb/\n", "10. Generating Mazes\n", " Link: https://healeycodes.com/generating-mazes\n", "11. Show HN: Visualize database schemas with a single query\n", " Link: https://github.com/chartdb/chartdb\n", "12. Small Strings in Rust: smolstr vs. smartstring\n", " Link: https://fasterthanli.me/articles/small-strings-in-rust\n", "13. Hot Page – a graphical site builder\n", " Link: https://hot.page/\n", "14. Shell.how: Explain Shell Commands\n", " Link: https://www.shell.how\n", "15. \"YOLO\" is not a valid hash construction\n", " Link: https://blog.trailofbits.com/2024/08/21/yolo-is-not-a-valid-hash-construction/\n", "16. Founder and CEO of Telegram arrested at French airport – report\n", " Link: https://old.reddit.com/r/worldnews/comments/1f0g013/founder_and_ceo_of_telegram_arrested_at_french/\n", "17. Postgres as a Search Engine\n", " Link: https://anyblockers.com/posts/postgres-as-a-search-engine\n", "18. NASA announces Boeing Starliner crew will return on SpaceX Crew-9\n", " Link: https://twitter.com/NASA/status/1827393397939634503\n", "19. Julius: Open-source reimplementation of Caesar III\n", " Link: https://github.com/bvschaik/julius\n", "20. ArcticDB: Why a Hedge Fund Built Its Own Database\n", " Link: https://www.infoq.com/presentations/arcticdb/\n", "21. Birds aren't real – how to create your own \"bird\"\n", " Link: https://www.lampysecurity.com/post/birds-aren-t-real-how-to-create-your-own-bird\n", "22. Techniques for safe garbage collection in Rust\n", " Link: https://kyju.org/blog/rust-safe-garbage-collection/\n", "23. Ruby's official documentation just got a new look\n", " Link: https://docs.ruby-lang.org/en/master/\n", "24. Golang Interpreter Written in PHP\n", " Link: https://github.com/tuqqu/go-php\n", "25. Telegram founder Pavel Durov arrested at French airport\n", " Link: https://www.theguardian.com/media/article/2024/aug/24/telegram-app-founder-pavel-durov-arrested-at-french-airport\n", "26. Implementing React from Scratch\n", " Link: https://www.rob.directory/blog/react-from-scratch\n", "27. Show HN: High school robotics code/CAD/design binder release\n", " Link: https://www.chiefdelphi.com/t/team-341-miss-daisy-2024-cad-code-and-design-binder-release/467546\n", "28. The Vital Necessity of Very Old Books (2023)\n", " Link: https://www.millersbookreview.com/p/vital-necessity-of-very-old-books\n", "29. OneText (YC W23) Is Hiring a Chief of Staff\n", " Link: https://www.ycombinator.com/companies/onetext/jobs/B1RIFNb-founding-chief-of-staff\n", "30. Anthropic Claude 3.5 can create icalendar files, so I did this\n", " Link: https://gregsramblings.com/stupid-but-useful-ai-tricks-creating-calendar-entries-from-an-image-using-anthropic-claude-35\n" ] } ], "source": [ "import requests\n", "from bs4 import BeautifulSoup\n", "\n", "def fetch_hackernews_top_stories():\n", " url = 'https://news.ycombinator.com/'\n", " response = requests.get(url)\n", " response.raise_for_status() # 检查请求是否成功\n", "\n", " soup = BeautifulSoup(response.text, 'html.parser')\n", " # 查找包含新闻的所有 标签\n", " stories = soup.find_all('tr', class_='athing')\n", "\n", " top_stories = []\n", " for story in stories:\n", " title_tag = story.find('span', class_='titleline').find('a')\n", " if title_tag:\n", " title = title_tag.text\n", " link = title_tag['href']\n", " top_stories.append({'title': title, 'link': link})\n", "\n", " return top_stories\n", "\n", "if __name__ == \"__main__\":\n", " stories = fetch_hackernews_top_stories()\n", " if stories:\n", " for idx, story in enumerate(stories, start=1):\n", " print(f\"{idx}. {story['title']}\")\n", " print(f\" Link: {story['link']}\")\n", " else:\n", " print(\"No stories found.\")\n" ] }, { "cell_type": "markdown", "id": "359abdfb-92fa-4349-8530-9ba1624026ef", "metadata": {}, "source": [ "## 趋势报告生成设计与实现\n", "\n", "**使用 OpenAI Playground 来测试和优化 Hacker News 趋势报告 Prompt。**\n", "\n", "Playground 测试链接:https://platform.openai.com/playground/p/WxNyR3sxUk5NeXILvaMCVyc1?model=undefined&mode=chat\n", "\n", "### 实践成就 ChatGPT 大师\n", "\n", "![chatgpt_master](images/chatgpt_master.jpg)\n", "\n", "\n", "### 抛砖引玉的 Prompt\n", "\n", "```\n", "你是一个关注 Hacker News 的技术专家,擅于洞察技术热点和发展趋势。\n", "\n", "任务:\n", "根据你收到的 Hacker News Top List,分析和总结当前技术圈讨论的热点,不超过5条。\n", "\n", "格式:\n", "# Hacker News 技术洞察\n", "\n", "## 时间:{当天日期}\n", "\n", "## 技术前沿趋势与热点话题\n", "\n", "1. **个人项目与创作**:许多用户在 \"Ask HN\" 讨论中分享了他们正在进行的项目,这凸显了开发者界对个人创作及创业的持续热情。\n", "\n", "2. **网络安全思考**:有关于“防守者和攻击者思考方式”的讨论引发了对网络安全策略的深入思考。这种对比强调防守与攻击之间的心理与技术差异,表明网络安全领域对攻击者策略的关注日益增加。\n", "```\n", "\n", "### GPT-4o-mini API Response\n", "\n", "```\n", "# Hacker News 技术洞察\n", "\n", "## 时间:2024年8月24日\n", "\n", "## 技术前沿趋势与热点话题\n", "\n", "1. **个人项目与创作**:在“Ask HN”中,用户积极分享他们的项目和创作。这种氛围表明,开发者社区对于自我表达和创业的热情依然高涨。\n", "\n", "2. **网络安全新思维**:讨论“防守者与攻击者思维方式”的主题引发了广泛关注。这种差异化思维强调了对网络安全的深层理解,表明行业对创新安全策略的需求持续增加。\n", "\n", "3. **神经科技的进展**:有关于神经科技的文章分享了数字化和量化发展数据,提示技术圈关注这一快速发展的领域,尤其是在医疗和增强人类能力方面的应用潜力。\n", "\n", "4. **开源项目的兴起**:如 Serpent OS 和 Jules 等开源项目吸引了技术社区的注意,反映了开源软件在推动技术创新和开发者合作方面的重要性。\n", "\n", "5. **Rust 编程语言的讨论**:关于 Rust 的多个讨论,尤其是关于小字符串处理和安全垃圾回收技术的文章,显示出 Rust 语言在现代编程中的应用迅速增长,开发者对其性能和安全特性的兴趣不断上升。\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "4dbd5723-2652-45e9-869d-d59f93c0d65e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "id": "de0561c4-2b45-411e-ba0a-17b8a12949cb", "metadata": {}, "source": [ "## Homework: 在 GitHubSentinel v0.5 基础上,扩展实现 Hacker News 趋势报告生成。\n", "\n", "### 实现优先级:Daemon(Required) > Graido > Command" ] }, { "cell_type": "code", "execution_count": null, "id": "d395e03e-cdc6-4aaa-a0ce-114ccab85be0", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 5 }