summary refs log tree commit diff
diff options
context:
space:
mode:
-rw-r--r--notes/faq.org16
-rw-r--r--notes/twitter.org13
2 files changed, 28 insertions, 1 deletions
diff --git a/notes/faq.org b/notes/faq.org
index 73d43c0..51b6845 100644
--- a/notes/faq.org
+++ b/notes/faq.org
@@ -10,3 +10,19 @@ console.sh <host>
 #+END_SRC
 
 More on the [[https://confluence.twitter.biz/display/DCE/IPMI%2BRemote%2BConsole%2Bconnectivity][wiki]].
+* Linux                                                               :linux:
+** Namespaces
+*** List of containers on Mesos                              :@twitter:ops:
+To get a list of containers on a host, you can run the following command:
+
+#+BEGIN_SRC sh
+ip netns list | xargs -I {} cat /proc/{}/cgroup | grep freezer | cut -f 3 -d '/'
+#+END_SRC
+*** Running netstat in a container                           :@twitter:ops:
+On shared mesos, the network is divided in namespaces. Running =netstat= will not work, you need to use =ip netns= to be able to run =netstat= in a container. For example:
+
+#+BEGIN_SRC sh
+for c in $(ip netns); do echo $c && sudo ip netns exec $c netstat -ntp | grep 10.71.16.126; done
+#+END_SRC
+
+
diff --git a/notes/twitter.org b/notes/twitter.org
index 7c520f5..6aee604 100644
--- a/notes/twitter.org
+++ b/notes/twitter.org
@@ -1,7 +1,6 @@
 #+Title: Twitter
 #+Author: fcuny@twitter.com
 
-
 * People
 ** Cory
 *** TODO Get up to speed on alerts / viz
@@ -12,11 +11,23 @@
 *** DONE Complete feedback
 *** TODO Should we do a 'one day cleanup' where we go through alerts / warnings and prune / fix ?
 *** TODO Running custdevel is getting more and more expensive. What if we were to limit the size of the streams in there ?
+*** TODO We keep adding features (queue model, placement), but this won't be stable for months.
+How do we justify pushing back priority for life cycle. This add a clear benefit right now to the stability of the system.
+*** TODO We don't have the expertise we need on our system. 
+For months I've been asking what's the consequence of having many partitions. Either the answer is "it's fine" or "we don't know". In the last few weeks, we've seen issues because of the number of partitions:
++ IM with zookeeper
++ moving dataproducts
+We need to stop working on long term solution that might give us benefits, but we're not sure. We need to spend time to understand our current system. Without reporting we are blind. We have no idea how much resources are used, necessary, wasted, etc.
 * Projects
 ** OS 7 Migration
 + update for an aggressive [[https://docs.google.com/document/d/1bv_tGtB2mNgaA5ToQLRseY0lzE2vlALTiM5NraQtSkE/edit#][timeline]].
++ shared services are in progress ([[https://docs.google.com/document/d/1TVIIgc1mfvghj-cFUX0iINh6SgF1OmRSAvK_b4UjJKU/edit][doc]])
++ progress for our various services, run ~[[file+emacs:../bin/tw-os7-report][tw-os7-report]]~.
 * Tasks
 ** TODO Update Mesos ticket regarding zombie shards
 ** TODO Review Dan's doc for zookeeper tasks, and create tickets
+** TODO Add more capacity to data product WP cluster if we don't get the new hosts
+SCHEDULED: <2017-02-17 Fri>
 * Notes
 ** Manhattan
+** Hybrid Mesos for messaging services