Adds new scripts dir and Python bundle_lister.py. Includes:

- Local Makefile to be called from parent Makefile at source/
- README.md
- cloned_repo dir necessary for executing bundle_lister.py
- requirements.txt per developer standard
- template.html also necessary for executing bundle_lister.py

Signed-off-by: Michael Vincerra <michael.vincerra@intel.com>
This commit is contained in:
Michael Vincerra
2018-12-15 18:29:00 -08:00
parent aca22ba939
commit 92c499067d
7 changed files with 231 additions and 0 deletions
+9
View File
@@ -0,0 +1,9 @@
py:
python bundle_lister.py
cp bundles.html.txt ../../introduction
rm -rf cloned_repo/*
rm bundles.html.txt
@echo "Python script finished succesfully!"
@echo "Next run make html. Then run make publish."
+19
View File
@@ -0,0 +1,19 @@
`bundle_lister.py` is a Python (3.6.0) web scraper and file generator. First, it clones the clr-bundles directory, https://github.com/clearlinux/clr-bundles. Second, it parses content all bundles in the clr-bundles/ directory and the `packages-descriptions` file. Third, it uses Jinja2 to output the result of the analysis to: bundles.html.txt. This ``.txt`` file is then referenced in `bundles.rst`, whose title is `Available bundles`, which is currently: https://clearlinux.org/documentation/clear-linux/reference/bundles.
`bundle_lister.py` automates documentation so it shows current bundles and packages per daily updates to the clr-bundles GitHub repository.
`bundle_lister.py` will be invoked in a bash script in the `source/Makefile` of clear-linux documentation. Therefore, `bundle_lister.py` will automatically create newly scraped and parsed data upon each build of the clearlinux.org website, and output an accurate, up-to-date table that shows all bundles and packages for interested Linux developers and admins.
See `requirements.txt` for dependencies necessary to run this application.
Python==3.6.0
To run `bundle_lister.py` in the terminal, enter: `python bundle_lister.py`.
Note: The `cloned_repo` directory must remain in the parent directory in order for this code to work.
Note: A successful build will produce a file named `bundles.html.txt` showing a table of current bundles and pundles (packages) alphabetized, with a (UTC) time and date stamp in the right corner.
An unsuccessful build will result in traceback errors, which should be analyzed before running a new build.
`~$~`
+98
View File
@@ -0,0 +1,98 @@
import io
import os
import re
import urllib
import jinja2
from jinja2 import Environment, FileSystemLoader, Template
import git
from operator import itemgetter
from datetime import datetime
GITHUB_BASE = "https://github.com/clearlinux/clr-bundles/tree/master/bundles/"
PUNDLES = "https://github.com/clearlinux/clr-bundles/blob/master/packages"
PATTERN1 = re.compile(r"#\s?\[TITLE]:\w?(.*)")
PATTERN2 = re.compile(r"#\s?\[DESCRIPTION]:\w?(.*)")
PATTERN3 = re.compile(r"\(([^()]*|include)\)", re.MULTILINE)
PATTERN4 = re.compile(r"^((?:(?!#)\w+[^-\s][-])\w+|\w+[^\s-])", re.MULTILINE)
# ALT PATTERN4 = re.compile(r"^((?:(?!#)(\w+[^-\s])[-]\w+.)[^\s]{1,}[^\s]|\w+[^\s-])", re.MULTILINE)
PATTERN5 = re.compile(r"^(?!=a)\w.+\s[#]\s(\w+.*)?", re.MULTILINE)
# Previous version: PATTERN5 = re.compile(r"^[^#].*(?<=\s\-\s)(\w+.*)?", re.MULTILINE)
def extractor(lines):
bundle_title = "title"
data_desc = "description"
url = "url"
include_list = []
for i in lines:
title = PATTERN1.match(i)
desc = PATTERN2.match(i)
includes = PATTERN3.findall(i)
if title:
bundle_title = title.groups(0)[0].strip()
if desc:
data_desc = desc.groups(0)[0].strip()
if url:
url = os.path.join(GITHUB_BASE, bundle_title)
if includes:
include_text = includes[0].strip("()")
include_list.append(include_text)
return {"title": bundle_title, "data_desc": data_desc, "include_list": include_list, "url": url}
def pundler():
with io.open("./cloned_repo/clr-bundles/packages") as file_obj:
lines = file_obj.readlines()
pundle_title = "pundle_title"
pundle_desc = "pundle_desc"
purl = "purl"
pundle_list = []
pun_desc = []
pundle_master = []
for i in lines:
pundle = PATTERN4.findall(i)
pundle_plus = PATTERN5.findall(i)
if pundle:
pundle_title = pundle[0]
pundle_list.append(pundle_title)
if pundle_plus:
pundle_desc = pundle_plus[0].strip("[]")
pun_desc.append(pundle_desc)
for pun, desc in zip(pundle_list, pun_desc):
pundle_master.append({"title": pun, "pun_desc": desc, "purl": PUNDLES})
return pundle_master
def bundler():
data = []
try:
git.Git("./cloned_repo/").clone("https://github.com/clearlinux/clr-bundles.git")
except:
pass
for root, dirs, files in os.walk("./cloned_repo/clr-bundles/bundles", topdown=False):
for name in files:
with open(os.path.join(root, name)) as file_obj:
lines = file_obj.readlines()
data.append(extractor(lines))
pundle_master = pundler()
data = data + pundle_master
filtered = list(filter(lambda x: x.get('title'), data))
sortedData = sorted(filtered, key=lambda x:x['title'].lower())
#ALT sortedData2 = sorted(sortedData, key=itemgetter('title'))
loader = jinja2.FileSystemLoader(searchpath='./')
env = jinja2.Environment(loader=loader)
template = env.get_template('template.html')
template.globals['now'] = datetime.utcnow
output = template.render(data=sortedData, now=datetime.utcnow())
with io.open('bundles.html.txt', 'w') as file:
file.write(output)
bundler()
+2
View File
@@ -0,0 +1,2 @@
Jinja2==2.10
GitPython==2.1.11
+48
View File
@@ -0,0 +1,48 @@
table {
margin: 32px;
border: 1px solid #e0e0e0;
border-collapse: collapse;
width: auto;
}
th {
font-family: IntelClear-Regular,Helvetica,Arial,sans-serif;
align-content: center;
padding: 5px;
border: #ccc solid 1px;
background-color: #555;
color: #fff;
text-transform: uppercase;
font-size: 18px;
}
tr {
padding-top: 20px ;
padding-bottom: 10px;
}
tbody tr:nth-child(odd) {
background-color: #e0e0e0;
}
.bundlename {
font-family: IntelClear-Regular,Helvetica,Arial,sans-serif;
font-size: 16px;
font-weight: bolder;
padding-left: 6px;
line-height: 18px;
padding-top:7px ;
padding-bottom: 5px;
}
.bundledesc {
font-family: IntelClear-Regular,Helvetica,Arial,sans-serif;
font: italic;
font-size: 16px;
padding-left: 6px;
line-height: 18px;
padding-top: 7px ;
padding-bottom: 5px;
}
ul, li {
margin-left: 8px;
/* padding: 0; */
padding-left: 5px;
padding-top: 2px;
line-height: 16px;
}
+55
View File
@@ -0,0 +1,55 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Bundles in Clear Linux* OS</title>
</head>
<body>
<table id="bundletable">
<thead>
<tr>
<th></th>
<th style="text-align:right; font-family:IntelClear-Regular,Helvetica,Arial; font-style:italic">
Updated: {{ now.strftime('%x %H:%M') }} UTC
</th>
</tr>
<tr>
<th> Name</th>
<th> Description</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
</tr>
<tr></tr>
{% for d in data %}
{% if d.url %}
<tr id="bundle">
<td class="bundlename" id="bundle"><a href="{{d.url}}">{{d.title}}</a></td>
<td class="bundledesc">{{d.data_desc}} <br />
{% if d.include_list %}
<p>Includes bundle(s):
{% for include in d.include_list %}
<li>{{include}}</li>
{% endfor %}
</p>
{% endif %}
</td>
</tr>
{% else %}
<tr id="pundle">
<td class="bundlename"><a href="{{d.purl}}">{{d.title}}</a></td>
<td class="bundledesc"> {{d.pun_desc}} </td>
</tr>
{% endif %}
{% endfor %}
</tbody>
</table>
</body>
</html>