Szarny.io

There should be one-- and preferably only one --obvious way to do it.

Python3メモ - Web関連

requestsモジュールを用いたWebコンテンツへのアクセス

>>> import requests

>>> req = requests.open("https://www.google.com")

>>> for k,v in req.headers.items():

...        print("[{}]{}".format(k,v))

[Date]Mon, 10 Jul 2017 04:10:28 GMT

[Expires]-1

[Cache-Control]private, max-age=0

...

>>> print(req.text)
b'<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="ja"><head><meta content="\x90\xa2\x8aE...

 

Web Server

PythonのWebフレームワークを用いることで,リクエストとレスポンスといった基本的な処理から,URLルーティング・動的ページ生成・セッション管理といった複雑な処理まで行うことができる.

Bottle

from bottle import run, route, static_file

 

@route("/")
def root():
return static_file("index.html", root=".")

 

@route("/<yourname>")
def yourname(yourname):
return "Hello! {}".format(yourname)

 

run(host="localhost", port=9999)

 Flask

from flask import Flask, render_template, request

app = Flask(__name__)

 

@app.route("/")
def main():
    return app.send_static_file("index.html")

 

@app.route("/echo/<arg1>/<arg2>")
def echo(arg1, arg2):
    k = {}
    k["path1"] = arg1
    k["path2"] = arg2
    k["get1"] = request.args.get("get1")
    k["get2"] = request.args.get("get2")
    return render_template("template.html", **k)

app.run(port=9999, debug=True)

template/template.html

<html>
<head><title>Flask</title></head>
<body>
<p>path1: {{path1}}</p>
<p>path2: {{path2}}</p>
<br>
<p>GET1: {{get1}}</p>
<p>GET2: {{get2}}</p>
</body>
</html>

 

Webスクレイピング

BeautifulSoup

def get_links(url):
    import requests
    from bs4 import BeautifulSoup as soup

    html = soup(requests.get(url).text)
    links = [element.get("href") for element in html.find_all("a")]
    return links


url = input("URL : ")
for i, link in enumerate(get_links(url), start=1):
    print(i,link)

 

参考文献 : 入門Python3 O'REILLY