- Fixup pod.getPID() to cover race between xToRun() and stage1 getting
around to writing the pid file.
- Rework `rkt enter` to retrieve the pid in rkt/stage0 and supply it to
stage1 enter.
- Rework stage1 enter to consume pid from argv instead of opening it
itself, so as to not have to duplicate the same race coverage.
- Some stage1 enter code cleanups thrown in for good measure, particularly
around the argv forwarding copy which started simple had become unwieldy.
When using overlay, the stage1 filesystem is mounted in a separate mount
namespace, rkt can't access the gc binary.
We now get the gc binary from the tree cache, using the stagee1 image
the user specified when preparing/running the container.
Also remove --no-idle from metadata-service since it
is no longer required. It was really only useful with
--spawn-metadata-svc but also racy.
Fixes#724
Ports that were defined in app manifest can be
exposed via --port=name:host-port option on cmd line.
For example, given app manifest with ports entry:
{
"name": "http",
"port": 80,
"protocol": "tcp"
}
rkt run --private-net --port=http:8888 myapp.aci
will forward traffic from host's tcp port 8888 to
container's port 80.
Fixes#624
When not using overlay fs, the tree cache was not getting populated.
Since we now take the enter binary from the tree cache, this was
breaking rkt enter.
We now populate stage1's tree cache in every case. We don't do it for
the app images because it can add a significant amount of time in the
first run if the image is big and it's not needed unless you use overlay
fs.
Having the permissions of overlay, upper and work dirs set to 0700 makes
"/" in the container have 0700 permissions, which breaks images that try
to execute files as different users.
When using overlay fs rkt status stopped working because it searched for
the status file in the stage1 rootfs, which is either in a different
mount namespace or unmounted.
If the container uses overlay fs, we now search in its upper layer,
which is accessible outside the mount namespace or when the filesystem
is not mounted.
When using overlay, the stage1 filesystem is mounted in a separate mount
namespace, rkt can't access the enter binary.
We now get the enter binary from the tree cache, using the stage1 image
the user specified when preparing/running the container.
This mounts stage1 and the application images as an overlay filesystem
using each ACI's cached tree as the lower filesystem.
Also, the mounts are done in a separate mount namespace so they will be
unmounted when the container exits and they're not visible by the rest
of the system.
Systems that don't support overlay fall back to plain copying.
By using systemd's Standard{Input,Output,Error} options we set
/dev/console in stage1 as the tty for the app (see systemd.exec(5)).
This makes interactive executables like bash work with rkt run (or
prepare+run-prepared).
This is only supported if the container has only one app.
Also renamed --inherit-environment to --inherit-env, the former is too
verbose for consistent use with --set-env considering --set-env is to be
used repeatedly for multiple variables, abbreviate both uniformly.
This incarnation of --set-env applies set variables globally to all apps,
it seems desirable to be able to specify target apps for the variables.
Limiting inheritance to specific apps may also be useful.
`rkt run` and `rkt prepare` can now receive arguments on the commandline
which get appended to the default exec arguments of the preceding app
image.
Examples:
Append --foo=bar to the second aci in a two aci invocation:
`rkt run bar.aci foo.aci -- --foo=bar`
Append options to both acis:
`rkt run bar.aci -- --foobar --- foo.aci -- --woot`
or if preparing:
`rkt prepare bar.aci -- --foobar --- foo.aci -- --woot`
Also lays groundwork for general support of CRM overrides, though only argument
appending has been plumbed to the rkt commandline.
Fixes#564
Conflicts:
stage0/run.go
`rkt prepare` does the "prepare" portion of `rkt run`, outputting a uuid:
$ rkt prepare imgs/pauser.aci
23def438-d2ad-401e-8b52-6ebc49813180
$
The prepared container is displayed in `rkt list`:
$ rkt list
UUID ACI STATE
23def438-d2ad-401e-8b52-6ebc49813180 pauser prepared
$
Instantly run the prepared container passing the uuid to `rkt run-prepared`:
$ rkt --debug run-prepared 23def
2015/03/06 19:54:23 Pivoting to filesystem /var/lib/rkt/containers/run/23def438-d2ad-401e-8b52-6ebc49813180
2015/03/06 19:54:23 Execing /init
Spawning container rootfs on /var/lib/rkt/containers/run/23def438-d2ad-401e-8b52-6ebc49813180/stage1/rootfs.
Press ^] three times within 1s to kill container.
...
Once run via `rkt run-prepared`, the behavior of a prepared container is
identical to that of the usual `rkt run` container.
--volume flags you would normally supply to `rkt run` to influence the
behavior of the container are instead supplied to `rkt prepare`, since
these are applied in the prepare phase. --spawn-metadata and
--private-net continue to be specified at run(-prepared) time.
The flags split may change, this is just the natural fit at the moment.
Fixes#550
Introduces new states for a container directory, now:
embryo: containers/embryo/$uuid
prepare: containers/prepare/$uuid & x-locked
prepare-failed: containers/prepare/$uuid & unlocked
prepared: containers/prepared/$uuid
running: containers/run/$uuid & x-locked
exited: containers/run/$uuid
exited-garbage: containers/exited-garbage/$uuid & unlocked
exited-deleting: containers/exited-garbage/$uuid & x-locked
garbage: containers/garbage/$uuid & unlocked
deleting: containers/garbage/$uuid & x-locked
Some of these states overlap, exited-garbage and exited-deleting for
example both imply exited.
For a simple `rkt run` invocation, the container never enters the prepared
state, instead it directly transitions from preparing to running.
For the split `rkt prepare` and `rkt run-prepared` invocation the
container enters the prepared state between the two.
When a container is first created, it starts in the embryo/ directory.
This allows us to acquire the x-lock before renaming it into prepare/, so
it's _always_ x-locked when in the preparing state in the prepare/
directory, making it safe to treat any unlocked directory within prepare/
as failed/aborted (for gc purposes). embryo/ is effectively a stage where
the directory is created in isolation, locked, then brought into the world
via rename into prepare/.
The prepared/ directory is where perfectly good container directories,
successfully prepared, await for their run-prepared. When eventually run,
we acquire the exclusive lock here where the lock has no significance,
before renaming into run/ where everything must be locked or it's eligible
for gc (exited).
What used to be garbage/ is now exited-garbage/. garbage/ is now the
garbage directory without the exited implication for gc of prepare-failed
and abandoned prepared containers. They're both serviced by `rkt gc`.
UUID and container directory generation has been moved out of stage0/run.go
and into rkt/containers.go, including the lock acquisition for `rkt run`.
This has mostly been done to facilitate the split prepare and run-prepared
feature, fixing holes in the container creation lock coverage (embryo
required) while at it.
See Documentation/container-lifecycle.md for more complete details.
Currently the annotation "coreos.com/rocket/stage1/init" represents the stage1
entrypoint used by `rkt run`. Renaming to "coreos.com/rocket/stage1/run" is
more consistent and self-documenting, aligning with the `rkt enter` entrypoint
annotation "coreos.com/rocket/stage1/enter".
Take two at getting the spec vendored into Rocket with Godep.
Since actool is used during the construction of the stage1.aci, it
really needs to be vendored too to prevent any unexpected divergence
between whatever version the user happens to have in their PATH. Thus,
we introduce a silly dummy package (stage1/dummy.go) to coerce Godep
into vendoring actool. This also requires a slight rearrangement of the
appc repo, moving some functionality from actool itself into the aci
package.
One may now specify an alternative stage1 in a style like run and fetch:
rkt run --stage1-image foo.com/rocket/stage1 app
--stage1-image defaults to "stage1.aci" within the same directory as the rkt
binary. This is discovered at runtime via "/proc/self/exe"; as long as the rkt
executable and stage1.aci share a directory it should "just work" regardless of
the directory's location and where rkt is executed from.