Using boto3, moto and freezegun in a Python test

Recently I worked on a Python script to monitor and delete objects inside an S3 bucket.
I published an excerpt on GitHub: python-projects/delete-s3-objects

The code in the repo mostly consist of the cleanup() function. The function use boto3 to connect to AWS, pull a list of all the objects contained in a specific bucket and then delete all the objects older than n days.
I have included a few examples of creating a boto3.client which is what the function is expecting as the first argument. The other arguments are used to build the path to the directory inside the S3 bucket where the files are located. This path in AWS terms is called a Prefix.

As the number of the objects in the bucket can be larger than 1000, which is the limit for a single GET in the GET Bucket (List Objects) v2, I used a paginator to pull the entire list. The objects removal follow the same principle and process batches of 1000 objects.

Now this was all good fun but the really interesting part was creating a proper unittest.

After some searching I found moto, the “Mock AWS Services” library. It is brilliant!
Using this library the test will mock access to the S3 bucket and create several objects in the bucket. You can leave the dummy AWS credentials in the script as they won’t be needed.

At this point I wanted to create multiple objects in the S3 mocked environment with different timestamps, but unfortunately I discovered that this is not possible. Once an object is created in S3 the date of creation metadata cannot be easily altered, see here for reference.

Cue another awesome library called freezegun. The test use freeze_time to mock the date/time and create S3 objects with different timestamps, so that we can safely experiment with the logic of the cleanup() function (‘leave objects older than n days, delete everything else within the prefix‘).

$ python test_script.py 
mock-root-prefix/mock-sub-prefix/test_object_01 2019-08-29 00:00:00+00:00
mock-root-prefix/mock-sub-prefix/test_object_02 2019-08-28 00:00:00+00:00
mock-root-prefix/mock-sub-prefix/test_object_03 2019-08-27 00:00:00+00:00
mock-root-prefix/mock-sub-prefix/test_object_04 2019-08-26 00:00:00+00:00
mock-root-prefix/mock-sub-prefix/test_object_05 2019-08-25 00:00:00+00:00
mock-root-prefix/mock-sub-prefix/test_object_06 2019-08-24 00:00:00+00:00
<class 'botocore.client.S3'>
Cleanup S3 backups
Working in the bucket:         my-mock-bucket
The prefix is:                 mock-root-prefix/mock-sub-prefix/
The threshold (n. days) is:    4
Total number of files in the bucket:     7
Number of files to be deleted:           3
Deleting the files from the bucket ...
Deleted:        3
Left to delete: 0
.
----------------------------------------------------------------------
Ran 1 test in 0.798s

OK

I am on TPTM podcast

Yes, I am on Talk Python To Me podcast Episode #174: Coming into Python from another Industry (part 2).

Now I have a very good excuse to brush up my github and start adding all the stuff I have been working on.

My first script

I have finished my first real attempt at Python.

Nothing too complicated, just a simple script:

  • Python 3 (of course!)
  • minimal modules for max portability
  • pull information from Xymon and parse it
  • pull RAID and disk information using omreport and parse it to obtain a list of disks failed/in predictive failure, serial numbers, etc.
  • generate a report and print a template
  • follow the disk rebuild when the option -p is used, wrap the screen refresh in curses and wait for ‘q’ to be pressed then exit

I am sure it can be improved and made a lot better, as well as I am sure I did some horrible mistake somewhere. But it’s a start.

Find it on GitHub: https://github.com/markgreene74/smallprojects/blob/master/failed_disk.py

Now that this project is finished I can focus on the #100DaysOfCode in Python.

TIL Python3 functions all() and str()

All this time, the solution to my problems was just in front of my eyes.

While I am procrastinating getting ready for the 100DaysOfCode challenge, I am working on a project to rewrite a bash script in Python3.

Today I solved two problems in one go. I am sure the experienced coder would have done it in a minute and with time to spare, but hey that’s how we learn.

My first problem was to nicely transform a list of strings in a way that would make it easier to do multiple regex search on it.

The list looks like this:

ID : 0:0:2
Status : Non-Critical
Name : Physical Disk 0:0:2
State : Online
Failure Predicted : Yes
Progress : Not Applicable
Bus Protocol : SAS
Media : HDD
(...)

It is the result of a command (omreport storage pdisk controller=0) that gathers information about the disks status.

After a closer look I discovered that each item of the list is a byte object (b'ID : 0:0:2') and needs to be transformed to a string type. Also, I wanted to make a nice block for each disk.

This did the trick:

for i in omreport:
    disk_string += str(i, 'utf-8')
blocks = disk_string.split("\n\n")

Note the ‘utf-8’ encoding. More here.

Now blocks is a list of multi-line strings (real strings!) that can be processed with re.findall:

found1 = re.findall(r'^ID\s+\:\s(.*)\n', block, re.MULTILINE)
found2 = re.findall(r'^Status\s+\:\s(.*)\n', block, re.MULTILINE)
found3 = re.findall(r'^State\s+\:\s(\w+)\n', block, re.MULTILINE)

From here we can build a tuple formed by all the information needed and work with that.

found = (found1, found2, found3)

But the last bit is: how can we make sure that the tuple is not empty? How can we throw away something like this: ([], [], []).

And here comes all() to the rescue. More here.

This, again, did the trick:

if all(found):
    result.append(found)

100DaysOfCode checklist

I am about to start my 100DaysOfCode challenge in Python.

I am assuming everyone is familiar with the concept, but if you want to know more here’s some reading/listening material:

I will be following the course “#100DaysOfCode in Python” by Michael Kennedy, Bob Belderbos and Julian Sequeira.

Try is my pre-flight checklist:

  • ✓ setup a dedicated python3 environment on my DigitalOcean development droplet
  • clone the course GitHub repository
  • ✓ tune on an online radio to help focus (either SomFM: Groove Salad or Radio Swiss Classic)
  • ✓ undust my website (last post is dated 2015!) and my twitter account
  • code

(mostly a reminder to myself)