diff options
-rw-r--r-- | notes/faq.org | 28 | ||||
-rw-r--r-- | notes/twitter.org | 68 |
2 files changed, 0 insertions, 96 deletions
diff --git a/notes/faq.org b/notes/faq.org deleted file mode 100644 index 51b6845..0000000 --- a/notes/faq.org +++ /dev/null @@ -1,28 +0,0 @@ -#+Title: F.A.Q -#+Author: franck.cuny@gmail.com -#+TAGS: @twitter(t) -#+TAGS: ops linux tools - -* IPMI :@twitter:ops: -#+BEGIN_SRC sh -ssh ipmibastion1.atla.twitter.com -console.sh <host> -#+END_SRC - -More on the [[https://confluence.twitter.biz/display/DCE/IPMI%2BRemote%2BConsole%2Bconnectivity][wiki]]. -* Linux :linux: -** Namespaces -*** List of containers on Mesos :@twitter:ops: -To get a list of containers on a host, you can run the following command: - -#+BEGIN_SRC sh -ip netns list | xargs -I {} cat /proc/{}/cgroup | grep freezer | cut -f 3 -d '/' -#+END_SRC -*** Running netstat in a container :@twitter:ops: -On shared mesos, the network is divided in namespaces. Running =netstat= will not work, you need to use =ip netns= to be able to run =netstat= in a container. For example: - -#+BEGIN_SRC sh -for c in $(ip netns); do echo $c && sudo ip netns exec $c netstat -ntp | grep 10.71.16.126; done -#+END_SRC - - diff --git a/notes/twitter.org b/notes/twitter.org deleted file mode 100644 index c60238f..0000000 --- a/notes/twitter.org +++ /dev/null @@ -1,68 +0,0 @@ -#+Title: Twitter -#+Author: fcuny@twitter.com - -* People -** Cory -*** TODO Get up to speed on alerts / viz -*** TODO Work with Philip to take the work / knowledge on load test cluster -*** TODO Should I setup a 1:1 with him ? -*** DONE I don't want to split ownership -** Mahak -*** DONE Complete feedback -*** TODO Should we do a 'one day cleanup' where we go through alerts / warnings and prune / fix ? -*** TODO Running custdevel is getting more and more expensive. What if we were to limit the size of the streams in there ? -*** TODO We keep adding features (queue model, placement), but this won't be stable for months. -How do we justify pushing back priority for life cycle. This add a clear benefit right now to the stability of the system. -*** TODO We don't have the expertise we need on our system. -For months I've been asking what's the consequence of having many partitions. Either the answer is "it's fine" or "we don't know". In the last few weeks, we've seen issues because of the number of partitions: -+ IM with zookeeper -+ moving dataproducts -We need to stop working on long term solution that might give us benefits, but we're not sure. We need to spend time to understand our current system. Without reporting we are blind. We have no idea how much resources are used, necessary, wasted, etc. -*** TODO I want internal PRR before we push queue model / placement -*** TODO Do you have a monthly sync. meeting with ads / ads prediction / search / MH ? -*** TODO I need to go faster for DLog migration, can I remove 10 hosts from ads prediction in atla ? That would help a lot at this point. -** Ravi -*** TODO DC tour as off-site ? -** Philip -*** DONE Work with Cory for the load test cluster -Talk with him so we can get complete documentation, runbooks, tools to monitor/check status, etc. -* Projects -** [[https://jira.twitter.biz/browse/PUBSUB-17420][OS 7 Migration]] -+ [[https://docs.google.com/document/d/1_9JAwCB1BPa-IcYerG9w5VrDpA-swtGZG2-AgXUA8s0/edit][Doc]] for Mahak -+ update for an aggressive [[https://docs.google.com/document/d/1bv_tGtB2mNgaA5ToQLRseY0lzE2vlALTiM5NraQtSkE/edit#][timeline]]. -+ shared services are in progress ([[https://docs.google.com/document/d/1TVIIgc1mfvghj-cFUX0iINh6SgF1OmRSAvK_b4UjJKU/edit][doc]]) -+ progress for our various services, run ~[[file+emacs:../bin/tw-os7-report][tw-os7-report]]~. -* Tasks -** TODO Update Mesos ticket regarding zombie shards -** TODO Review Dan's doc for zookeeper tasks, and create tickets -** TODO Add more capacity to data product WP cluster if we don't get the new hosts -SCHEDULED: <2017-02-17 Fri> -* Notes -** Manhattan -3 different clusters: RO / RW / ZAC - -RO: 2 copies, no quorum. Configure the number of buckets (10K). Cluster has a set of mirror set. 2 nodes per mirror set. you can keep adding mirror set. - -coord / replicas are 2 processes running on each node. Query goes to coord, there's a consistent hashing to find which replicas has the data. Query both nodes in the mirror set, and fastest one replies. - -in RW there's a quorum. At least 2 nodes have to reply on the request. - -users create application (dataset) - - -QL: that's how they achieve strong consistency - -otherwise it's eventual consistency -** Hybrid Mesos for messaging services -** SRE Sync -*** Tasks -+ [X] Get exception for Messaging services - this was denied (?) -+ [ ] Write a doc on what needs to be done to bring kafka / kestrel up to date -+ [ ] [[https://jira.twitter.biz/browse/CLDS-1384][Mesos zombie ticket]] -+ [ ] Tickets for deploying faster on COORD -*** Notes -+ There's a disconnect in communication regarding the OS migration. -+ Find who's using Kafka and which library they use -+ what's next for MOPUB ? - |