I’m a desktop Linux user who has used a Mac for a while because Linux doesn’t really work on laptops. Thinking that perhaps Linux ought to have caught up I bought a Lenovo Yoga and installed Arch on it. Arch can’t handle the screen resolution or the tablet mode and the battery only lasts 4 hours, so let’s try Windows 11.
It’s necessary to create a bootable USB stick. There is a download page with an ISO image, but that image is for optical disks. If you put it onto a USB stick the firmware will recognise it but the Windows bootloader won’t (it complains about missing drivers during the process). To create a USB installation you should go through the Windows image creating process to create the image. I used a virtual machine, although apparently it is possible to use woeusb too.
The installation creates four partitions:
The installation process requires you to use a Microsoft account; this gives you an Admin account that is tied to this MS account. It’s possible to convert to a local account but the home path remains as an abbreviation of the MS account name. However, you can create a new local account then use that account to delete the MS one.
Before doing much more it’s appropriate to encrypt the disk. The current way is enable bitlocker. For me this created an extra recovery partition on top of the one that the MS install created. So now there are five partitions.
There is a terminal app that knows about several shells. Two pertinent ones are:
cmd.exe
, the original command promptOne thing to know about PowerShell is that it functions on aliases; things like ls
are aliases to complicated looking calls to some API. You can add your own in a file:
~\Documents\WindowsPowerShell\profile.ps1
with the format
set-alias less more
Windows 11 has a package manager; awesome, use it! The syntax is along the lines of
winget search thunderbird
and then
winget install thunderbird
and it knows about quite a lot of packages. The syntax can be wordy, but basically it works well. This gets you things like Zoom too. Also Visual Studio Code; I think I typed
winget install vscode
but I could be wrong.
Windows 11 comes with WSL: Windows Subsystem for Linux. This is a lightweight virtual machine that runs a standard Linux distribution with a slightly modified kernel. From powershell you can type
wsl --install
which lists the available distributions, then something like
wsl --install Debian
Thereafter you can type bash
in powershell and Debian will appear. The terminal app also gains a configuration to run Debian directly from the terminal. The Windows filesystem is visible as /mnt/c
and the Linux filesystem is in the file manager.
I’m told that you can then install more or less anything you want under Linux. However, for me this is missing the point; it’s Windows so let’s see what works native.
VSCode is quite well integrated with Git, it’s not there though. However, SSH is. To start the OpenSSH
agent, find the Services app, scroll to OpenSSH and right click Properties
. Select Automatic (Delayed Start)
. The config is in ~/.ssh
. After a restart you can then say ssh-add
as under Linux.
For Git, run winget install git.git
. This gets you a windows version of git. The gotcha is that it comes compiled with its own version of SSH. It’s necessary to force it to use the the “native” one by setting the environment variable GIT_SSH
; this can be done on the command line:
setx GIT_SSH C:\Windows\System32\OpenSSH\ssh.exe
Now Git and SSH should work.
Many references suggest using texlive
. However, I find that MikTeX is better:
winget install miktex
VSCode now works fine with LaTeX.
The “native” viewer is Edge. However, it’s a bit heavy and it doesn’t automatically reload the document when it changes. Current solution is SumatraPDF:
winget install sumatrapdf
which seems to work fine with LaTeX, except for the ugly yellow logo.
Generally Not Used Except by Middle Aged Computer Scientists; however, still very useful. Install with winget
. By default it has no idea where your home directory is, but it understands HOME
, so
setx HOME $env:USERPROFILE
It still doesn’t default to loading from there, but it understands ~
, so not so painful. auctex
works well with MikTeX.
For software development you can install Visual Studio and it’s enormous. For just command line things it’s possible to install just the build tools. For this, I think I did
winget install Microsoft.VisualStudio.2022.BuildTools
This gives you a graphical installer from which you can install just the C++ compiler. I’m still lost with this. For instance, Visual Studio includes cmake and ninja depending on how much of it you install. I have at least the cl.exe
compiler and nmake
, and this is added manually to the path:
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.34.31933\bin\Hostx64\x64
which seems stupidly long, but does appear to “work”. Right now nmake
works with LaTeX, but I’m yet to get compilation up.
This process adds another option to the terminal app: “Developer VS 2022” for both cmd
and PowerShell. This in turn is just a shell with the dev tools on the path.
NVidia chips and CUDA infrastructure are complicated. There are at least five different things that need to be compatible before a program will run:
libcuda.so
(see below)libcublas.so
and carries the familiar version number, e.g., 11.3.Of the above, 1 and 2 are kernel-side things; the system manager will make sure that the kernel driver is compatible with the hardware.
Items 4 and 5 are user-side things. Any package manager, notably conda, will take care of that compatibility; any application in conda will have a dependency on the appropriate toolkit. Of course you need to make sure that the toolkit matches the hardware capability.
I single out item 3, libcuda.so
, because it’s something that is associated with the application and toolkit. It needs to be present for an application to link. However, libcuda.so
is distributed by NVidia with the driver, item 2. Following the links in Debian, libcuda.so.1 -> libcuda.so.510.39.01
. Bizarrely, an application cannot link unless the driver package is installed on the machine.
That last requirement can be averted by a package called nvidia-compat
. The compatibility package is actually meant to enable new applications to work on older drivers, but it happens to contain a version of libcuda.so
that should allow an application to link when otherwise lacking a driver installation.
The kernel-side installation need not coincide exactly with the user-side. For instance, kernel driver 510.39.01 is a CUDA 11.6-capable driver, but works perfectly well with CUDA toolkit 11.X and several previous versions. This is important in the context of compatibility with applications: e.g., pytorch currently supports the 11.3 toolkit, but had issues with 11.2. This was independent of the driver.
]]>Back in the day, in order to install Linux, I would partition a hard drive with four partitions:
It was BIOS, so it only knew about MBR, and MBR only supported four (primary) partitions. It all made sense. All the partitions were formatted ext3 or ext4, except perhaps the boot partition that didn’t need a journal.
Since about 2010, a few things changed:
systemd-boot
. It comes with the OS; use it.So, there are still four partitions. But that last point also enables encryption: LVM can sit on top of LUKS. So the second GPT partition is first encrypted as one big block, then LVM further partitions things once decrypted.
The boot partition had to be FAT32 (no problem); I tended to format root and home as BTRFS, but simply as a basic filesystem.
In the incarnation above, we still have a swap partition and two main partitions. However, in 2022 (but I am late!) there are two other important considerations:
The way BTRFS mimics the partitions that LVM would normally manage is by use of subvolumes. A subvolume is not a partition; it’s just a separate filesystem in the same “namespace”. Although I can’t find an authoritative answer, it is reasonable to assume that a hardware error should only affect one subvolume. As long as that assumption holds then there is no need for partitions other than boot and “main” (I tend to name it “data”).
One more consideration is that my machines tend to contain two hard disks. Typically this might be an SSD with boot, root and home, and a larger HDD with “common” things. Common things can be a media library or backup partition. This leads to the root partition containing two subvolumes acting as mountpoints: @home
with the home directories, and @data
with assorted common things.
A key tool that has become part of the BTRFS infrastructure is snapper. The BTRFS literature makes a big thing of subvolumes and the ability to make snapshots of them. Especially the OpenSuse infrastructure is built around the root partition being snapshotted. This in turn results in lots of other subvolumes that only exist to prevent them being snapshotted as part of the root subvolume. If (read: like me) you don’t want to snapshot the root partition, then this can be nonsensical.
Rather, for me, there are two main and one minor reasons to use subvolumes:
@
, @home
, &c, and are mounted using fstab
.useradd
utility has an explicit option to add a user as an explicit subvolume. This is sensible.Virtual Machines
directory and the @swap
top level volume.Note that all this is considered stable in BTRFS. By contrast managment of multiple disks is not, along with other features such as compression. The Debian wiki is quite explicit about this. The Arch wiki is, as ever, the voice of reason.
]]>The Kalman smoother arises when you have a sequence of \(N\) observations, \(\scalar_1\dots\scalar_N\), and you want to infer a sequence of unobserved states, \(\lambda_1\dots\lambda_N\); the states follow a first order Markov process, then the observations depend on the states (see the diagram below). Explanations of Kalman smoothers tend to start with the filter and then obfuscate it by blurring in a bunch of tricky Gaussian convolutions. Here we start at the top and derive the recursions without worrying about the actual distributions; they can be added later.
State \(i\) trivially depends upon state \(i-1\), but there is a less trivial dependency on state \(i+1\). So the first thing is integrate it out; this defines the Kalman smoother: \[\begin{aligned} \underbrace{\CondLi{\lambda_i}{\scalar_1\dots\scalar_N}}_{\text{Smoother }i} &= \int d\lambda_{i+1}\, \CondLi{\lambda_i}{\lambda_{i+1},\scalar_1\dots\scalar_N} \CondLi{\lambda_{i+1}}{\scalar_1\dots\scalar_N}, \\ &= \int d\lambda_{i+1}\, \CondLi{\lambda_i}{\lambda_{i+1},\scalar_1\dots\scalar_i} \underbrace{\CondLi{\lambda_{i+1}}{\scalar_1\dots\scalar_N}}_{\text{Smoother }i+1}.\end{aligned}\] So it’s recursive in the smoother term. Notice the conditional independence in the second line: given state \(i+1\), we don’t need the observations after that. The inference is the wrong way around in the first term though, so \[\begin{aligned} \CondLi{\lambda_i}{\lambda_{i+1},\scalar_1\dots\scalar_i} &= \frac{ \CondLi{\lambda_{i+1}}{\lambda_i,\scalar_1\dots\scalar_i} \CondLi{\lambda_i}{\scalar_1\dots\scalar_i} }{ \CondLi{\lambda_{i+1}}{\scalar_1\dots\scalar_i} }, \\ &= \frac{ \CondLi{\lambda_{i+1}}{\lambda_i}\CondLi{\lambda_i}{\scalar_1\dots\scalar_i} }{ \CondLi{\lambda_{i+1}}{\scalar_1\dots\scalar_i} }.\end{aligned}\] That final term in the numerator is now similar to the original question, but independent of future observatons; that is the Kalman filter. It is evaluated by considering the current observation: \[\begin{aligned} \underbrace{\CondLi{\lambda_i}{\scalar_1\dots\scalar_i}}_{\text{Filter }i} &= \frac{ \CondLi{\scalar_i}{\lambda_i,\scalar_1\dots\scalar_{i-1}} \CondLi{\lambda_i}{\scalar_1\dots\scalar_{i-1}} }{\CondLi{\scalar_i}{\scalar_1\dots\scalar_{i-1}}}, \\ &= \frac{ \CondLi{\scalar_i}{\lambda_i} \CondLi{\lambda_i}{\scalar_1\dots\scalar_{i-1}} }{\CondLi{\scalar_i}{\scalar_1\dots\scalar_{i-1}}}.\end{aligned}\] Again, some observations become redundant given knowledge of state \(i\). The final term in the numerator is the Kalman predictor; it is evaluated using the trivial relationship between states \(i\) and \(i-1\): \[\begin{aligned} \underbrace{\CondLi{\lambda_i}{\scalar_1\dots\scalar_{i-1}}}_{\text{Predictor }i} &= \int d\lambda_{i-1}\, \CondLi{\lambda_i}{\lambda_{i-1},\scalar_1\dots\scalar_{i-1}} \CondLi{\lambda_{i-1}}{\scalar_1\dots\scalar_{i-1}}, \\ &= \int d\lambda_{i-1}\, \CondLi{\lambda_i}{\lambda_{i-1}} \underbrace{\CondLi{\lambda_{i-1}}{\scalar_1\dots\scalar_{i-1}}}_{\text{Filter }i-1}.\end{aligned}\] So the filter evaluation is recursive given the predictor.
The evaluation of the whole thing involves working through the equations in the opposite order to which they’re derived:
Define an initial predictor, \(\Li{\lambda_0}\), to be some distribution.
Recurse the filter using the predictor from state \(1\) to state \(N\). This yields filter values for each state.
Initialise the smoother using the last filter value; recurse the smoother backwards from state \(N\) to state \(1\).
Your project is about spamming things; you kind of know a-priori that
in C++ it’d be a library called libspam; however, you’re using python
for some reason. To this end, start with a simple script called
spam.py
in a directory called spam
.
spam
└── spam.py
spam.py
is just a script; it doesn’t have any functions.
After a little while, the spamming gets quite large. You need another
script called bar.py
that also spams, but differently, so you want
to split the spamming out to a library.
To do this, move the common code into functions in the same file
called spam.py
, but move the different bits into distinct scripts:
spam
├── bar.py
├── foo.py
└── spam.py
So, spam.py
contains a common function called spam()
.
Then in foo.py
you can say
import spam
and it will find your new library and import it. In python-speak,
spam.py
is now a module. Python will find it even without setting
PYTHONPATH
. However, your function is now accessed as
spam.spam()
, which is stupid. So, this is better:
from spam import *
Or this:
from spam import spam
Now it’s just spam()
.
After a while, spam.py
might get quite big, including Spam
and
Eggs
classes (classes are CamelCase in python). So you tend to want
to split it up. The first thing might be to put the eggy bits into a
file called eggs.py
:
spam
├── bar.py
├── foo.py
├── spam.py
└── eggs.py
However, there’s now no relationship between these for python. To enforce that, put them both in a directory like this:
spam
├── bar.py
├── foo.py
└── spam
├── __init__.py
├── spam.py
└── eggs.py
In python speak, spam
is now a package. __init__.py
can be
empty; it just means “this is a package” to the python interpretter.
import spam
still finds everything. Python will still find it
without setting PYTHONPATH
.
There are now a lot of things called spam
though. You can say
(somewhat absurdly)
import spam
s = new spam.spam.Spam
It’s tricky to see a way around this.
As a C++ programmer, one sensible way seems to be to put classes in
files named for the class, so there’s a class Spam
in a file
spam.py
in the directory spam
(as implied above). Then put this
in __init__.py
:
from spam import *
from eggs import *
And in foo.py
you can say
from spam import Spam
from spam import Eggs
s = new Spam
e = new Eggs
So, the __init__.py
file can squash some of the namespace madness
that the file hierarchy introduces. Python people don’t appear to
think this is sensible though. They would have you put related
classes into one file. This has the side-effect of leading to big
files that are tricky to navigate.
I get asked English grammar questions sometimes. They are usually difficult for the same reason that Feynman’s differentiating under the integral worked: the person asking has already used all the tools to which they have access, and failed. I don’t have Feynman’s method though; they’re just tricky questions.
I also get asked to proof read things a lot.
We tend to learn that the second of two verbs is in the infinitive; it’s not entirely true. The second of two verbs is actually likely to be a noun phrase forming the object of a transitive verb. There are (at least) three general ways to do this:
These are all grammatically correct, albeit with slightly different nuances. However, consider this example:
In this case the first is wrong, but it’s not obvious why. The second and third are correct. It’s the third that explains why the first is wrong: the subject of the second verb is different to that of the first (you instead of I); the subjects were the same in the “like” case. The infinitive cannot express that, even though it’s clear from the use of “recommend”. The gerund one is OK because gerunds don’t have subjects.
This tends to happen with verbs like “recommend” and “allow” in a passive sense. In these cases the right answer is often to use the verb in an explicit nominal form:
There is a school of thought that XML is a be all and end all file format, suitable for anything requiring any kind of structured text file. This is OK when the text to be marked up is, well, text. Like this document. What bothers me is that there is a very very common class of file that stores essentially key-value pairs. Typically this is configuration information for a program. Crucially, it’s also typically information that requires hand editing.
An example is assigning values to variables. In a programming language, it might be done like this:
Value = 1;
Colour = "red";
where you don’t actually need the equals signs, the quotes or the semi-colons. In XML, this tends to come out like this:
<Value>1</Value>
<Colour>red</Colour>
or this
<entry Value="1" Colour="red" />
or this
<entry key="Value" value="1" />
<entry key="Colour" value="red" />
or, even
<entry>
<key>Value</key>
<value>1</value>
</entry>
<entry>
<key>Colour</key>
<value>red</value>
</entry>
In fact, if you take the view that the content should still be there when the markup is removed then that last one is the “right” one.
There are a whole bunch of far more suitable formats for key-value pairs. One of the most persuasive (to me) is the .ini format that used to come with Windows:
# This is a comment
[Section]
Value = 1
Colour = red
In fact, it’s this format that’s also used by Torvalds in git:
[core]
repositoryformatversion = 0
filemode = true
[remote "origin"]
url = git+ssh://www.site.org/blah.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master
ALSA has a nice enough format:
pcm.!default {
type plug
slave.pcm "softvol"
}
pcm.softvol {
type softvol
slave {
pcm hw:UA25EX
}
control {
name "SoftMaster"
card 1
}
}
Not even Timber Nerdsley uses XML for key-value pairs. CSS is key-value pairs and looks like this:
body {
margin-left: 3em;
margin-right: 3em;
margin-top: 3em;
margin-bottom: 3em;
font-family: sans-serif;
}
img.mugshot {
margin-right: 1ex;
}
Even the guys who write XML Schema acknowledge that XML is not the thing to use to write it: http://en.wikipedia.org/wiki/RELAX_NG
Edit: (years later) JSON is the thing, it’s awesome. Lube speaks JSON natively.
]]>