Integrating Sphinx doctests with Travis

If you want to run Sphinx doc tests in Travis you will need to jump through some hoops. Fortunately it isn't that complicated, I've created some sample code on Github/doctest-travis for you to follow along.

Doc test are a way to ensure that your documentation is up to date. In principle doc tests should be largely analogous to unit tests. That's the reason we want to be able to run them from our CI job. On the other hand, doc tests are part of the documentation, so they will actually be visible to the reader. In practice, if using Sphinx, you will likely add sample code to your documentation in a doc test as part of you regular documentation. Actually executing doc tests as part of your CI ensures that what the user perceives as sample code will actually work the way they expect it to.

The Python module

Doc tests are currently only supported for Python code, that means there is no easy way to thread in command line invocation, compile C++ or run Ruby code. We'll start with a simple Python module in doctest_travis/code.py.

def stupid_sum(a, b):
    return a+b+1

def stupid_mul(a, b):
    return a*b*2

Additionally there's some support code in the __init__.py to load these functions into the package by default.

The basic unit tests

To test our stupid functions, we'll create some basic unit tests. These reside in tests/test_stupid.py and get executed by pytest. They use the Python unittest framework.

from doctest_travis import *

class TestStupidFunctions(unittest.TestCase):

    def test_stupid_sum(self):
        self.assertEqual(stupid_sum(1,2), 4)

    def test_stupid_mul(self):
        self.assertEqual(stupid_mul(2,2), 8)

Now running pytest tests/ from the top level will execute our unit tests.

Autodoc and doctest

We'll use autodoc to import function signatures from Python. Running doc tests is a lot like running regular unit tests. We'll can use a testsetup and testcleanup directive in the document and initialize and tear down the testing environment in there. Both of these directives will not generate any output.

See the doctest docs for a full description of all options available.

Here's an excerpt from mashing up these two functionalities.

doctest_travis
--------------

.. automodule:: doctest_travis
   :members:


.. autofunction:: doctest_travis.stupid_sum

.. doctest::

    >>> stupid_sum(2, 3)
    6

You provide a python expression prefixed by >>> on thefirst line and end with an expression that is parseable by Python on the second line. The Sphinx doctest module will compare both and return success, if they match.

You can find the output of doctest-travis on readthedocs.org doctest-travis rtd.

The user will generally perceive this as sample code in the HTML form. Of course we want to go beyond just providing sample code, we want to have a doc test.

Go to the project's root directory and run the following command to check if our doc tests work.

$ python -msphinx -b doctest docs foo
...
Document: index
---------------
1 items passed all tests:
   2 tests in default
2 tests in 1 items.
2 passed and 0 failed.
Test passed.

Doctest summary
===============
    2 tests
    0 failures in tests
    0 failures in setup code
    0 failures in cleanup code

There we go. Our example code now runs as a doc test.

Tying everything into Travis

We'll use tox to automate the test runs. We'll have unit tests, a sample doc build and our doc test runner all running in the test step. All will be driven by tox. Here is the sample tox.ini to make this happen.

[travis]
python =
  2.7: py27
  3.6: py36

[testenv]
deps = 
    pytest
    sphinx
commands = 
    pytest --basetemp={envtmpdir} tests/
    python -msphinx -b html -d {toxworkdir}/_build/doctrees docs {toxworkdir}/_build/html
    python -msphinx -b doctest -d {envdir}/.cache/doctrees docs {envdir}/.cache/doctest

We'll map Travis environments to Python interpreters manually, using the [travis] section and the we'll pass our dependencies into [testenv]. We need both pytest and sphinx in the virtualenv that will be generated by tox. Finally we'll specify a set of commands. pytest runs our unit tests, python -msphinx -b html... runs the HTML builder and python -msphinx -b doctest... runs the doc test runner. All the output will later show up in Travis. We can now run tox locally by calling it on the console.

$ tox

The missing piece of information is our Travis configuration file. Travis runs different Python builds in different containers and that's what we want to do. So Python 2.7 will run in a specific container and Python 3.6 will run in a different container. This is a little bit redundant with tox, which also can spawn different virtualenv environments to run different versions of Python. What we need is a Travis file that triggers different build per Python version, the tox file given above will simply pick up the current Python version and build based on that version. Here's the .travis.yml.

sudo: false
language: python
python:
  - "2.7"
  - "3.6"
install: pip install tox-travis
script: tox -vv

The [travis] section in the tox.ini is filled out by this tox-travis plug-in that we trigger.

Conclusion

You can find the build status of this project on the Travis site here.

Now we've wrapped up everything. Travis calls tox, which calls the test runners and test builds. I hope you can adapt this scheme to your own projects.

Image credits: I, Avenafatua [GFDL (http://www.gnu.org/copyleft/fdl.html), CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/) or CC BY-SA 2.5 (https://creativecommons.org/licenses/by-sa/2.5)]

Impressum

Angaben gemäß § 5 TMG

Mark Meyer Heitmannstr. 73 22083 Hamburg

Kontakt

E-Mail: mark@ofosos.org

Verantwortlich für den Inhalt nach § 55 Abs. 2 RStV

Mark Meyer Heitmannstr. 73 22083 Hamburg

Haftung für Inhalte

Als Diensteanbieter sind wir gemäß § 7 Abs.1 TMG für eigene Inhalte auf diesen Seiten nach den allgemeinen Gesetzen verantwortlich. Nach §§ 8 bis 10 TMG sind wir als Diensteanbieter jedoch nicht verpflichtet, übermittelte oder gespeicherte fremde Informationen zu überwachen oder nach Umständen zu forschen, die auf eine rechtswidrige Tätigkeit hinweisen.

Verpflichtungen zur Entfernung oder Sperrung der Nutzung von Informationen nach den allgemeinen Gesetzen bleiben hiervon unberührt. Eine diesbezügliche Haftung ist jedoch erst ab dem Zeitpunkt der Kenntnis einer konkreten Rechtsverletzung möglich. Bei Bekanntwerden von entsprechenden Rechtsverletzungen werden wir diese Inhalte umgehend entfernen.

Haftung für Links

Unser Angebot enthält Links zu externen Websites Dritter, auf deren Inhalte wir keinen Einfluss haben. Deshalb können wir für diese fremden Inhalte auch keine Gewähr übernehmen. Für die Inhalte der verlinkten Seiten ist stets der jeweilige Anbieter oder Betreiber der Seiten verantwortlich. Die verlinkten Seiten wurden zum Zeitpunkt der Verlinkung auf mögliche Rechtsverstöße überprüft. Rechtswidrige Inhalte waren zum Zeitpunkt der Verlinkung nicht erkennbar.

Eine permanente inhaltliche Kontrolle der verlinkten Seiten ist jedoch ohne konkrete Anhaltspunkte einer Rechtsverletzung nicht zumutbar. Bei Bekanntwerden von Rechtsverletzungen werden wir derartige Links umgehend entfernen.

Urheberrecht

Die durch die Seitenbetreiber erstellten Inhalte und Werke auf diesen Seiten unterliegen dem deutschen Urheberrecht. Die Vervielfältigung, Bearbeitung, Verbreitung und jede Art der Verwertung außerhalb der Grenzen des Urheberrechtes bedürfen der schriftlichen Zustimmung des jeweiligen Autors bzw. Erstellers. Downloads und Kopien dieser Seite sind nur für den privaten, nicht kommerziellen Gebrauch gestattet.

Soweit die Inhalte auf dieser Seite nicht vom Betreiber erstellt wurden, werden die Urheberrechte Dritter beachtet. Insbesondere werden Inhalte Dritter als solche gekennzeichnet. Sollten Sie trotzdem auf eine Urheberrechtsverletzung aufmerksam werden, bitten wir um einen entsprechenden Hinweis. Bei Bekanntwerden von Rechtsverletzungen werden wir derartige Inhalte umgehend entfernen.

Quelle: e-recht24.de

Google Ad Sense

Unsere Website nutzt Google AdSense, einen Online-Werbedienst der  Google Inc. („Google“). Google AdSense verwendet sog. „Cookies“,  Textdateien, die auf dem Computer der Nutzer gespeichert werden und die  eine Analyse der Benutzung der Website ermöglicht. Google AdSense  verwendet auch sogenannte Web Beacons (unsichtbare Grafiken). Durch  diese Web Beacons können Informationen wie der Besucherverkehr auf den  Seiten dieses Angebots ausgewertet werden. Die durch Cookies und Web  Beacons erzeugten Informationen über die Benutzung dieser Website  (einschließlich der IP-Adresse der Nutzer) und Auslieferung von  Werbeformaten werden an einen Server von Google in den USA übertragen  und dort gespeichert. Diese Informationen können von Google an  Vertragspartner von Google weiter gegeben werden. Google wird Ihre  IP-Adresse jedoch nicht mit anderen von Ihnen gespeicherten Daten  zusammenführen.

Nutzer können die Installation der Cookies von Google AdSense auf verschiedene Weise verhindern:

a) durch eine entsprechende Einstellung der Browser-Software; b) durch Deaktivierung der interessenbezogenen Anzeigen bei Google; c) durch Deaktivierung der interessenbezogenen Anzeigen der Anbieter, die Teil der Selbstregulierungs-Kampagne „About Ads“ sind; d) durch dauerhafte Deaktivierung durch ein Browser-Plug-in.

Die Einstellungen unter b) und c) werden gelöscht, wenn Cookies in den Browsereinstellungen gelöscht werden.

Nähere Informationen zu Datenschutz und Cookies für Werbung bei Google AdSense sind in der Datenschutzerklärung von Google, insbesondere unter den folgenden Links zu finden:

http://www.google.de/policies/privacy/partners/ http://www.google.de/intl/de/policies/technologies/ads http://support.google.com/adsense/answer/2839090

(Dank an https://www.datenschutzbeauftragter-info.de/)

Creating a Custom Landing Page in Sphinx

Sample code lives here. When I refer to source code line numbers, you're supposed to look at the code in this repository. Sample output (courtesy of readthedocs.org) is at https://sphinx-landing-template.readthedocs.io/en/latest/.

By convention the landing page for your Sphinx project is a simple table of contents. Nothing prevents you from adding a custom logo. Can you add a landing page like Jupyter has, with custom HTML and CSS, with full control on how this page is laid out? Yes you can.

I used sphinx-quickstart to create a boilerplate template for my code. I didn't bother configuring intersphinx or any extensions I didn't need. I did let Sphinx know that I want to use both .rst and .md suffixes, which is not necessary for this exercise.

I opted to separate source and build directories. I also amended the exclude_patterns config variable by adding ['_build']

The idea here is, that we move the table of contents to a separate file. Then we create a custom index page, by telling Sphinx to render the page index via our template index.html. In index.html we can use the full power of HTML. We will subclass the existing layout.html from the Alabaster theme we're using. If you desire this, you can go all out and design a customized version of a landing page, that does not inherit from layout.html.

Switching index.rst

If you used sphinx-quickstart, your index.rst contains the table of contents. Rename this file to contents.rst now. Make this change known to Sphinx by changing the property master_doc in conf.py to contents (line 45 in conf.py). Using the Alabaster theme, setting master_doc in conf.py will link the project title on the left top to our table of contents page.

Next we let Sphinx know that we still want to generate the index page, but with a different template. Add the following statement to your conf.py (line 113):

html_additional_pages = {'index': 'index.html'}

This will prompt Sphinx to render the page index using the template index.html. Since index.html is not a default template, the only way to find this template is in our projects source/_templates/ folder.

Creating index.html

Now create a file named index.html in source/_templates, this will be instantiated by Sphinx to create our index page. Sphinx templating is a topic of it's own. I'll describe what I did in this example.

The first line of index.html tells Jinja, the Sphinx template processor, to inherit from the template named layout.html. This gives us much of the basic layout that is provided by Alabaster and leaves us to fill in the title and body of this template.

The seoncd line of index.html sets the title. The underscore notation is used for I18N (Internationalization). This provides for a convenient entry point into translating your project later on.

The main part start in line three. We define the block body here, which supplants the block with the same name from the layout.html template in Alabaster. Double curly braces indicate Python-esque code to Jinja, while {% ... %} indicates directives to it. In both of these constructs we can use underscore notation for I18N. It's also possible to call functions that have been defined as part of the context of this template. You see the function pathto() being used. It will resolve a relative path in the project's source directory to a path that your browser understands.

pathto() accepts as arguments the path to a document, if this is an .rst document, don't use the .rst extension. Instead just write the name of the doc, without extension. You see this used in pathto("contents"), when we refer to the table of contents.

If you want to include images, refer to them like pathto('_static/images/...') in line 8 of index.html. This assumes that you place your images into the folder source/_static/images in your project's root folder.

Adding custom CSS

Custom CSS can be added by using a conf.py extension. This is a way of adding customization to Sphinx in a per-project manner. Define a conf.py extension by adding a setup(app) function to your conf.py. Sphinx will call it like any other setup(app) method that is bundled in a third party extension.

In line 172 of conf.py we use this extension point to inject a custom stylesheet into the Sphinx build:

def setup(app):
    app.add_stylesheet('css/custom.css')

You could do much more, but this suffices to add a custom CSS file to the output of the HTML processor. This CSS will be available in all of the project, including index.html.

Summary

The idea behind this article is to create custom landing pages. These should allow for a large degree of customization, while still tying into the Sphinx workflow. With the ability to use custom HTML and CSS you can ask a design minded person to create your landing page, without them needing to understand reStructured Text. I hope this is helpful for you when creating beautiful landing pages.

References

Jupyter landing page https://jupyter.readthedocs.io/en/latest/install.html sphinx-quickstart https://www.sphinx-doc.org/en/master/usage/quickstart.html Custom CSS or JavaScript https://docs.readthedocs.io/en/latest/guides/adding-custom-css.html Sphinx templating https://www.sphinx-doc.org/en/master/templating.html Jinja templates http://jinja.pocoo.org/

Image credits:

United States Geological Survey (USGS) [Public domain], via Wikimedia Commons https://commons.wikimedia.org/wiki/File:Grand_Strand_Airport_-_South_Carolina.jpg

Readings on Group Dynamics and Self Improvement

I've had three books next to my bed during the last week. Let's start with the worst one.

Working out Loud

"Working out Loud" is supposedly a method to make your work visible among a wider circle, with the intention of garnering career advancement and/or happiness. The primary tools of this method are blogs and social media.

The books author frequently refers to Dale Carnegie's "How to win friends and influence people." On multiple occasions he pitches his work as "Dale Carnegie for social media". I have read Carnegie 20 years ago. I was fascinated by the materialistic outlook on human relations. Maybe I'm too much of a romantic for this to ring true to me.

Some parts of his narrative intrigue me. For example when he asks, "How many people do you know that do the same job as you do, outside of your circle of friends?" But overall this is a weak book, vying to modernize its shining example Carnegie.

I didn't like the original, and I don't like this crude copy. I moved the book to the shelf after 150 pages.

The Suprising Power of Liberating Structures

This is a print edition of material that is available online at liberatingstructures.com. It does not provide anything beyond what you will find available online. That said, I'm a dead tree lover and find real books easier to read, when spending an afternoon in bed.

Liberating structures is concerned with little activities or games, that help groups to work collaboratively. See for example 1-2-4-All, a way to pose a problem to a group of people and to gather input from everyone.

I despised these kind of group dynamic games, later grew indifferent, and now I'm a firm believer in some of these techniques. They're in themselves quaint little games that seem rediculous. I would assume that grown up persons don't need a 'game' to express their needs and desires, or to talk about how they feel. But this isn't the case.

Especially in a workplace environment people will often not tell what they think, they will zone out or agree to whoever they think is the leader. Liberating structures are a way to circumvent this behaviour and unleash more potential in your group.

I'm not part of leadership and not an agile coach, but this little book helps our group at work to perform better and be more inclusive. Having the printed version is a nice perk, if you're talking to a colleague about a structure and don't want to gesticulate at the screen all the time.

Recommended.

The Facilitators Guide to Participatory Decision Making

This is a large format paperback book, published by Wiley in their management series. It starts off by giving a short history of what facilitation means and then heads straight to teaching you about group dynamics.

The book is written with the expressed intent that its pages may be photocopied and used during group sessions. The target audience is not limited to managers or facilitators (sic!). I think everyone who is interested in participatory decision making; every member of a group that wishes for an inclusive decision process. Will find something interesting in this book.

The authors take the position that not every meeting will lead to a decision. They advocate that the process that leads to a decision needs to work for every member of the group; that differences in opinion may be present even after a decision process, but good groups will tolerate divergent positions and will thrive nonetheless.

The 'Guide' gives a series of tools for people who facilitate (not necessarily full time facilitators) meetings, that aim to build better groups and better individuals. They try to avoid situations where groups drift toward the lowest common denominator, instead encouraging excellence.

Recommended.

OSM Tile Creation on AWS Spot

This article describes some efforts to automate OSM tile creation on AWS spot instances.

I'm running a website for a regulars table in my city. As part of this regulars table we have a database of all pubs we visited, because we're in a different pub every Tuesday. Why not visualize this via OpenStreetMap? We wanted to create a nice slippy map with all pubs we had been to in the last 15 years. But OSM advises not to use their tile server, because they don't have much resources for the infrastructure available.

So the logical thing was to create a custom tileset on AWS spot and put it on S3 for serving to our website's visitors. This describes the automation that went into spinning up a spot instance and downloading the generated tiles to my local machine. AWS spot is a "spot market" for AWS instances that offers deep discounts on infrastructure. However if you host on AWS spot, your machines might go away at any time without notice. Spot is dynamically priced, so we're essentially one bidder among many.

The code

The code is available on Github at nomadenorg/osmtilegen. There is a set of scripts:

  • gen-tiles.py is our custom Python script to generate the tiles. This limits tile generation to the Hamburg (our city) bounding box.
  • mapgen.sh, this runs on the instance, it installs carto and mapnik, as well as postgresql version 10 and some utilities.
  • request.py is a boto3 script that does a spot request and prints the public hostname to stdout
  • spot-spec.json is the spot instance launch specification, this enables you to customize what instance config you'll end up with.
  • run.sh is the main driver that integrates it all.

When you start up run.sh it will run for approximately 30 minutes to generate the file tiles.tar, which will be downloaded to the local directory. In out case this is around 250 MB in size. The cost for a complete tile set is around $0.50 if I run in eu-west-1.

Extension points

There are some points where you can customize this workflow.

First you will want to roll your own gen-tiles.py to adapt the bounding box to your needs. Then you will also have to change the extract downloaded from the Hamburg extract to something else, maybe even the complete planet.

After you've done this you will probably want to start off with the OpenStreetMap Carto theme and customize it to your liking.

Both adaptions are rather easily done with the current setup.

Image credit: By GT1976 [CC BY-SA 4.0  (https://creativecommons.org/licenses/by-sa/4.0)], from Wikimedia Commons Wiki Commons

Printing to remarkable cloud from CUPS

A while back I wrote a tool to print directly from CUPS to the remarkable cloud. This is a little bit of description around that. I like the way I can skim the news in the morning and print directly to my remarkable and then read while riding the train or during lunch. This workflow really works for me and avoids senseless meandering and surfing as much as possible. I dial in my news consumption in the morning and then just read what is interesting during the day.

So this is a custom backend for CUPS. It consists of a drv file that is used to emit the ppd for CUPS. This ppd tells CUPS that it will only accept PDF for forwarding to the backend. And it also tells CUPS about the specific dimensions of the remarkable device.

The other part is a small shell script I mostly adapted from some sample code. This backend uses the rmapi client to connect to remarkable cloud. It then places the PDF generated by CUPS into the folder specified at creation time of the printer. You won't get prompted on where to place the file, it's predefined at printer creation time. If you really need multiple places to put your stuff, you can do this by creating multiple printers in CUPS.

So this is the way my morning paper moves from screen to screen. Everything ends up in my remarkable /Print/Home folder. If I think the article won't be useful anymore, I just delete them, otherwise I move them to a different folder.

And if I have some time on the weekend, I'll explain how to use thread this into the CUPS provided by Guix.

The code is in my scratch repo on GitHub.

Cover Image: By VGrigas (WMF) [CC BY-SA 3.0  (https://creativecommons.org/licenses/by-sa/3.0)], from Wikimedia Commons

AnkiDroid audio with Python and Google

I've been learning Czech recently. My Anki cards have my native tongue German on the front and Czech on the back. When listening to Czech audio I really took a long time to translate simple words or grapple with the conjugation of Czech verbs. The remedy to this situation appears easy: Create a deck with Czech audio and text on the front and German on the back of the card.

Anki is an open-source product and it's not too hard to dig up some info on how the Anki 2 database is formatted. An article by Julian Sobzak here explains the content of the .apkg format. It's a zip file with raw audio files (usually MP3) in a folder and an Sqlite3 database that contains all the text. This article also lists all the database fields you need to extract the info from the cards.

Getting the info from the database is conceptually simple, if you unzipped the apkg to the local directory and now have Sqlite database collection.anki2 in the local directory.

conn = sqlite3.connect('collection.anki2')
c = conn.cursor()

for row in c.execute('''SELECT guid, flds from notes'''):
    guid = hex(hash(row[0]))
    deu, cze = row[1].split('\x1f')

The flds column contains the fields of the notes (more on that later). Split flds by the char 0x1f and you'll have split it into it's constituent parts. Here I hash the GUID of the card to create a string that will serve as the file name.

There is another project named genanki by Kerrick Staley over on Github. This Python 3 library will help you generating a new deck from the newly created data. If you're familiar with using AnkiDroid, the only thing that'll be new to you will be the notion of a Model. This serves as a template, a Note is the actual content that is passed into the template at the time it is rendered. So we'll have a generic card that takes two parameters, the front and the back content.

deck = genanki.Deck(2059400110, 'Tschechisch Deutsch (Talki)')
package = genanki.Package(deck)
model = genanki.Model(
    1607392319,
    'Simple Model',
    fields=[
        {'name': 'Czech'},
        {'name': 'German'},
    ],
    templates=[
        {
            'name': 'Card 1',
            'qfmt': '{{Czech}}',
            'afmt': '{{FrontSide}}<hr id="answer">{{German}}',
        },
    ])

for ... :
    ...
    mediafiles.append('{}.mp3'.format(guid))
    my_note = genanki.Note(
        model=model,
        fields=['{}, [sound:{}.mp3]'.format(cze, guid),
                deu])
    deck.add_note(my_note)

package.write_to_file('output.apkg')

Finally we need a Text-to-Speech system to render the Czech words. Czech is not universally available from TTS providers. Notably AWS Polly can not do Czech at this point in time. Google GCE can do Czech output, but you can't get a GCE account without a company if you're from Europe.

Fortunately there's a way around that: The excellent gTTS by Pierre-Nick Durette enables you to dump MP3 output directly from Google Translate.

    tts = gTTS(cze, 'cs')
    with open('{}.mp3'.format(guid), 'wb') as f:
        tts.write_to_fp(f)

That's it. The full raw script is over at Github in my scratchrepository.

Feel free to adapt.

AWS Cloud images for Guix

We had a good, inspiring orchestration meeting at the Guix day pre-FOSDEM and as part of that I've been working on making public AWS cloud images available for Guix. This post serves as a status report, since we're not quite there yet.

The code that runs the builds is available on Github. The idea behind this is, to facilitate people building their own customized Guix images for the cloud in addition to building a Golden Master.

The strategy I employ to get a working Guix image is very simple. I start up an Ubuntu 16.04 machine, then Download the latest binary Guix release and install it on Ubuntu. I run an improvised cow-store.sh script that emulates the cow-store service from the Guix installer. When that's done, I run a guix system init config.scm /mnt to install on an additional attached disk.

This process ends up with a Guix system on the disk. I then make a snapshot of this disk and create an AMI from this snapshot.

All the AWS plumbing in this process is highly automated using Hashicorps Packer. In fact, about the only thing that needed additional automation is the formatting of the disks and the triggering of the Guix install process.

When this process has run the result is a bootable Guix image.

But there's more to it. AWS provides a way to inject SSH keys into newly created instances. Technically AWS provides the public key in the metadata store which can be queried by every instance. The instance is then supposed to include this key as part of it's ssh setup.

The most recent change I made to the images was to include a service that queries the metadata store at boot time of the machine and places the key in the ssh configuration for user alyssa.

You don't need to use this mechanism. In fact you can easily inject your own keys into the images at build time, disregarding the AWS provided public keys.

In some setups it might be required to build images with predefined or even no keys, in other cases the AWS provided functionality might be useful. In all cases this is a feature that people have come to expect from images on AWS.

This is where we stand now. Simply running packer in a guix-packer checkout will give you a working Guix cloud image. While being bootstrapped from Ubuntu might seem a bad idea, I actually prefer to do it this way. Building in the cloud is in many ways faster and easier than building at home and hulking an 8GiB image across a residential internet connection.

Towards public images

The next steps towards providing public images (just click on GuixSD and be done) will roughly follow the following paths.

I think we need a second provisioning step to do a guix pull && guix system reconfigure to get the latest package content into the image, especially because this might eat up a lot of time to do.

All ssh host keys need to be removed and a service that improves the availability of randomness needs to be built into the system.

We need a way to extend the Guix config to allow provisioning of additional disks, when these have been attached during instance creation.

There's much more to do, currently console screengrabs don't work, system logs are not displayed in the AWS console and much more. On the other hand the Elastic Network Adapter is supported by default in Linux Libre and we enable it as part of the packer config, leading to the ability to use the most recent AWS instance generation (e.g. C5 instances).

I hope to deliver public GuixSD images for AWS in the coming weeks.

Walk on a sunday, na, scrape distrowatch

A friend asked me today if I could do some Perl programming for him. I'm a bit rusty in Perl, but I still understand it well enough to port it to Python. The source module was on CPAN and basically scraped all of the metadata off of distrowatch.com. The new modules lives on PyPi as pulldistros. It will basically scrape most of the metadata.

The main libraries used are BeautifulSoup and requests.

One nice thing to note, BeautifulSoup allows you to use different parsers. The default Python HTML parser will interpret something like

<ul>
  <li>foo
  <li>bar
</ul>

As a tree, where each <li> element sits below the previous <li> element. However the html5lib parser will be much closer to how a browser interprets this. This means that its output will list all <li> tags as siblings.

A nice gem Tobi pointed me to is requests_cache, which will create a transparent cache for any requests that are sent out by requests. This speeds up testing a lot, since I had to run around 1000 requests, to test the whole setup.

Web scraping with Python is surprisingly fun.

The source lives on Github here.

Cover image, Matthew Harrigan [CC BY-SA 2.0 (https://creativecommons.org/licenses/by-sa/2.0) or CC BY-SA 2.0 (https://creativecommons.org/licenses/by-sa/2.0)], via Wikimedia Commons, https://commons.wikimedia.org/wiki/File:Sandcastle1.jpg